[XeTeX] XeTeX maintenance

Joseph Wright joseph.wright at morningstar2.co.uk
Mon Apr 27 08:41:45 CEST 2015

On 27/04/2015 01:05, Douglas McKenna wrote:
> Joseph Wright wrote:
>> \def\"{0}\expandafter\def\csname^^^^^00022\endcsname{1}
>> \ifnum\"=0 \message{tex82}\else\message{newstuff}\fi
> When I implemented a Unicode escape sequence extension using double-caret notation in the JSBox TeX-language interpreter I've been working on (which is all 21-bit Unicode internally, all the time, but can be configured at run-time to be 8-bit input only), I was unaware of what XeTeX had implemented, so I just used
> ^^uxxxx (for 16-bit, BMP codes)
> ^^Uxxxxxx (for all 21-bit Unicode code points)
> Seemed straightforward enough.

XeTeX conventions have been picked up by LuaTeX on this, and there's
been some 'feedback' from LuaTeX to XeTeX to give us some
standardisation for Unicode primitives/syntax (admittedly with bugs, but
that's a different point). I'd hope that any future Unicode TeX-like
systems would also pick up on the model used by XeTeX/LuaTeX.

> Given that the number of TeX input files using ^^u is likely miniscule, and the number of those that follow the ^^u or ^^U with four or six hex digits is even smaller, it seemed like a worthwhile benefit vs. cost, compatibility-wise.  Maybe there's something I've not thought out well.

I didn't mean that there would be many real-world docs with this issue.
I was trying to point out that it's almost impossible to imagine that a
Unicode TeX-like engine could be used as a drop-in replacement for the
current 8-bit ones (pdfTeX most obviously), so when we talk about 'the
future' we have to mean 'for documents written assuming Unicode' rather
than 'for all existing TeX documents'. (For mathematicians the latter
point is very important.)

> This discussion I just found is both pertinent and frightening, I suppose:
> http://stackroulette.com/tex/62725/the-notation-in-various-engines

That's a (questionable) reuse of the info from


Note that the discussion is editable (wiki-like) and to my knowledge is
still correct as-is. There are some tricky issues in XeTeX, particularly
related to non-BMP chars, partly because working out what should happen
here has been a work-in-progress.
Joseph Wright

More information about the XeTeX mailing list