[XeTeX] xetex and the unicode bidirectional algorithm.
mskala at ansuz.sooke.bc.ca
mskala at ansuz.sooke.bc.ca
Mon Dec 9 15:16:03 CET 2013
On Mon, 9 Dec 2013, Philip Taylor wrote:
> Keith -- could you possible supply an example of
> "a properly encoded utf-8 string" from which it
> can be unambiguously determined whether the string
> "sang" is an English word (the past tense of "sing")
I'll probably regret pointing this out, and the characters involved have
been deprecated since Unicode 5, but:
U+E0001 U+E0065 U+E006E U+0073 U+0061 U+006E U+0067
or in UTF-8 bytes:
f3 a0 80 81 f3 a0 81 a5 f3 a0 81 ae 73 61 6e 67
The Web form you mentioned sanitizes away the special characters. I don't
think that's unique to "tags" - it seems to also block everything outside
the Basic Multilingual Plane. Bad form for something claiming to be an
authoritative analyser of Unicode strings.
--
Matthew Skala
mskala at ansuz.sooke.bc.ca People before principles.
http://ansuz.sooke.bc.ca/
More information about the XeTeX
mailing list