[XeTeX] XeTeX: ready for full-time math?
Jonathan Kew
jonathan_kew at sil.org
Tue Aug 22 11:43:01 CEST 2006
On 22 Aug 2006, at 4:59 am, Ross Moore wrote:
> Hi Jonathan,
>
> In a different thread ...
>
> On 06/08/2006, at 8:40 AM, Jonathan Kew wrote:
>> One way to achieve this, noting that maqqef is Unicode character U
>> +05BE, would be to make this an "active character" and program it to
>> insert a discretionary break after itself. Something like this:
>>
>> \catcode"05BE = \active
>> \def^^^^05be{\char"05BE\discretionary{}{}{}}
>>
>> (The use of uppercase "05BE" and lowercase "05be" here is deliberate
>> and required!)
>
> Can the ^^^^ notation be used/extended to create a token for
> plane-1 (or higher-plane) characters ?
Not currently; at this level, you're directly addressing the UTF-16
codes that are used to represent Unicode text during processing. So a
Plane 1 character could be represented as a surrogate pair (two
^^^^xxxx codes), but that's not a single token; it's two.
>
> e.g., mathematics needs to access Plane-1
> U+1D515 (fraktur R)
(Oops.... fraktur R is actually U+211C, and U+1D515 is unassigned!
But that's incidental to the point, of course.)
>
> I can declare a macro \frakR to have an expansion
> that includes \char"1D515 .
Right; \char accepts the full Unicode range, and generates surrogate
pairs internally when appropriate.
> But I want to do also:
>
> \catcode"1D515\active
> \let^^^^^1d515\frakR
>
> so that a character that is input directly will have
> the same result as using the named macro.
And this is what you can't (currently) do. The \catcode table (along
with \lccode, \uccode, \sfcode, and \mathcode) contains values for
the UTF-16 code values (i.e., "0000 ... "FFFF), so you can't make a
non-Plane 0 character \active, for instance.
I could consider extending this to fully support higher planes, but
my question would be whether this is really needed. Do we expect to
see *input* text that uses the Plane 1 math characters directly? I
would have thought it more likely that they will either be
represented by markup (e.g., MathML entities from some authoring
system, which could map to TeX control sequences like \frakR), or
keyed as plain ASCII letters that are intended to appear in a
particular style/family that maps to Plane 1 characters in a Unicode
math font.
Thus, it seems likelier to me that someone will want to do
\newfamily\fracfam
\textfont\fracfam = \MyUnicodeMathFontWithFrakturMapping
$ \fracfam W $ % accesses U+1D51A from the math font
as typing directly in the source text is not usually very
convenient.
(I wonder how many people's email clients will show that properly!)
This is open for discussion, though.... if there's a real desire to
use the higher-plane math letters directly in input text, not only
access them in Unicode fonts, then an extension of these code tables
may be appropriate.
> But I want even more than this, since I want to be able
> to include coding, such as above, inside a macro-expansion.
>
> e.g. using a declaration such as:
>
> \DeclareActiveMathCharacter{1D515}{\frakR}{\mathord}
>
> being expanded using a meta-macro something like:
>
> \def\DeclareActiveMathCharacter#1#2#3{%
> \def#2{\ensuremath{\text{#3{\char"#1}}}}%
> \catcode"#1\active \lowercase{\let^^^^^#1#2}%
> }
>
> This currently cannot work, even with just 4 hex-bits,
> since the ^^^^ is resolved on input of the macro-code,
> whereas I need it to be delayed until the
> meta-macro \DeclareActiveMathCharacter is called.
>
> Can this idea be made to work ?
Not in this form, at least... ^^ (and its extension ^^^^, and
potentially ^^^^^^) is, as you say, resolved during the initial text
input. Changing that would break a great deal of stuff; and I'd be
reluctant to invent a whole new, parallel mechanism for later
resolution of an "escape" sequence like this.
I think it's possible to define such active characters by other
approaches, though, such as by making use of \lccode changes to get
to the character you want. E.g., something along the lines of
(untested):
\def\DeclareActiveMathCharacter#1#2#3{%
\def#2{\ensuremath{\text{#3{\char"#1}}}}%
\count255=\lccode`\~ % assuming ~ is active
\lccode`\~="#1 \lowercase{\let~#2}%
\lccode`\~=\count255
}
Here, #1 will currently be limited to FFFF, of course, because there
are no \catcodes or \lccodes beyond this. That's my question
(above).... do people really feel a need for these? Will
supplementary-plane math letters become commonplace *as literal
characters in input text*? Comments invited!
JK
More information about the XeTeX
mailing list