[XeTeX] XeTeX: ready for full-time math?

Jonathan Kew jonathan_kew at sil.org
Tue Aug 22 11:43:01 CEST 2006

On 22 Aug 2006, at 4:59 am, Ross Moore wrote:

> Hi Jonathan,
> In a different thread ...
> On 06/08/2006, at 8:40 AM, Jonathan Kew wrote:
>> One way to achieve this, noting that maqqef is Unicode character U
>> +05BE, would be to make this an "active character" and program it to
>> insert a discretionary break after itself. Something like this:
>>    \catcode"05BE = \active
>>    \def^^^^05be{\char"05BE\discretionary{}{}{}}
>> (The use of uppercase "05BE" and lowercase "05be" here is deliberate
>> and required!)
> Can the ^^^^ notation be used/extended to create a token for
> plane-1 (or higher-plane) characters ?

Not currently; at this level, you're directly addressing the UTF-16  
codes that are used to represent Unicode text during processing. So a  
Plane 1 character could be represented as a surrogate pair (two  
^^^^xxxx codes), but that's not a single token; it's two.

> e.g.,  mathematics needs to access Plane-1
>    U+1D515   (fraktur R)

(Oops.... fraktur R is actually U+211C, and U+1D515 is unassigned!  
But that's incidental to the point, of course.)

> I can declare a macro  \frakR  to have an expansion
> that includes  \char"1D515 .

Right; \char accepts the full Unicode range, and generates surrogate  
pairs internally when appropriate.

> But I want to do also:
>    \catcode"1D515\active
>    \let^^^^^1d515\frakR
> so that a character that is input directly will have
> the same result as using the named macro.

And this is what you can't (currently) do. The \catcode table (along  
with \lccode, \uccode, \sfcode, and \mathcode) contains values for  
the UTF-16 code values (i.e., "0000 ... "FFFF), so you can't make a  
non-Plane 0 character \active, for instance.

I could consider extending this to fully support higher planes, but  
my question would be whether this is really needed. Do we expect to  
see *input* text that uses the Plane 1 math characters directly? I  
would have thought it more likely that they will either be  
represented by markup (e.g., MathML entities from some authoring  
system, which could map to TeX control sequences like \frakR), or  
keyed as plain ASCII letters that are intended to appear in a  
particular style/family that maps to Plane 1 characters in a Unicode  
math font.

Thus, it seems likelier to me that someone will want to do

   \textfont\fracfam = \MyUnicodeMathFontWithFrakturMapping

   $ \fracfam W $ % accesses U+1D51A from the math font

as typing �� directly in the source text is not usually very  

(I wonder how many people's email clients will show that properly!)

This is open for discussion, though.... if there's a real desire to  
use the higher-plane math letters directly in input text, not only  
access them in Unicode fonts, then an extension of these code tables  
may be appropriate.

> But I want even more than this, since I want to be able
> to include coding, such as above, inside a macro-expansion.
> e.g. using a declaration such as:
>     \DeclareActiveMathCharacter{1D515}{\frakR}{\mathord}
> being expanded using a meta-macro something like:
>   \def\DeclareActiveMathCharacter#1#2#3{%
>    \def#2{\ensuremath{\text{#3{\char"#1}}}}%
>    \catcode"#1\active \lowercase{\let^^^^^#1#2}%
>   }
> This currently cannot work, even with just 4 hex-bits,
> since the  ^^^^ is resolved on input of the macro-code,
> whereas I need it to be delayed until the
> meta-macro  \DeclareActiveMathCharacter  is called.
> Can this idea be made to work ?

Not in this form, at least... ^^ (and its extension ^^^^, and  
potentially ^^^^^^) is, as you say, resolved during the initial text  
input. Changing that would break a great deal of stuff; and I'd be  
reluctant to invent a whole new, parallel mechanism for later  
resolution of an "escape" sequence like this.

I think it's possible to define such active characters by other  
approaches, though, such as by making use of \lccode changes to get  
to the character you want. E.g., something along the lines of  

     \count255=\lccode`\~ % assuming ~ is active
     \lccode`\~="#1 \lowercase{\let~#2}%

Here, #1 will currently be limited to FFFF, of course, because there  
are no \catcodes or \lccodes beyond this. That's my question  
(above).... do people really feel a need for these? Will  
supplementary-plane math letters become commonplace *as literal  
characters in input text*? Comments invited!


More information about the XeTeX mailing list