[XeTeX] XeTeX maintenance

Douglas McKenna doug at mathemaesthetics.com
Mon Apr 27 02:05:45 CEST 2015


Joseph Wright wrote:

> \def\"{0}\expandafter\def\csname^^^^^00022\endcsname{1}
> \ifnum\"=0 \message{tex82}\else\message{newstuff}\fi

When I implemented a Unicode escape sequence extension using double-caret notation in the JSBox TeX-language interpreter I've been working on (which is all 21-bit Unicode internally, all the time, but can be configured at run-time to be 8-bit input only), I was unaware of what XeTeX had implemented, so I just used

^^uxxxx (for 16-bit, BMP codes)
^^Uxxxxxx (for all 21-bit Unicode code points)

Seemed straightforward enough.

In the first case, if any one of the four 'x's is not a lowercase hex digit, interpretation reverts to the standard TeX escape sequence ^^u (ASCII '5'), followed by four input characters, at least one of which is not a hex digit.  Similarly for the six hex digit case, for whatever character ^^U converts to, if at least one of the six characters following is not a hex digit.

Given that the number of TeX input files using ^^u is likely miniscule, and the number of those that follow the ^^u or ^^U with four or six hex digits is even smaller, it seemed like a worthwhile benefit vs. cost, compatibility-wise.  Maybe there's something I've not thought out well.

This discussion I just found is both pertinent and frightening, I suppose:

http://stackroulette.com/tex/62725/the-notation-in-various-engines


Doug McKenna




More information about the XeTeX mailing list