[XeTeX] xetex and the unicode bidirectional algorithm.

mskala at ansuz.sooke.bc.ca mskala at ansuz.sooke.bc.ca
Mon Dec 9 15:16:03 CET 2013

On Mon, 9 Dec 2013, Philip Taylor wrote:
> Keith -- could you possible supply an example of
> "a properly encoded utf-8 string" from which it
> can be unambiguously determined whether the string
> "sang" is an English word (the past tense of "sing")

I'll probably regret pointing this out, and the characters involved have
been deprecated since Unicode 5, but:

   U+E0001 U+E0065 U+E006E U+0073 U+0061 U+006E U+0067

or in UTF-8 bytes:

   f3 a0 80 81 f3 a0 81 a5 f3 a0 81 ae 73 61 6e 67

The Web form you mentioned sanitizes away the special characters.  I don't
think that's unique to "tags" - it seems to also block everything outside
the Basic Multilingual Plane.  Bad form for something claiming to be an
authoritative analyser of Unicode strings.
Matthew Skala
mskala at ansuz.sooke.bc.ca                 People before principles.

More information about the XeTeX mailing list