[tex-live] Critical bugfix needed for XeTeX [was: Re: Fwd: Duplicate Thai patterns reported by XeTeX in TL 2015 pretest]
jfkthame at gmail.com
Thu Apr 16 15:18:45 CEST 2015
The problem reported with Thai patterns, as well as the Korean problem
mentioned by Dohyun Kim (incorrect \showthe output from a token list
that should contain a couple of Korean characters), turns out to be a
symptom of an input-processing bug.
This is a critical issue inasmuch as it can result in silently
discarding certain characters from the user's input, with no indication
that typesetting has failed in any way.
Any Unicode character whose low byte is 0x20 or 0x09 could be affected.
The problem arises from the (unsigned char) typecasts added in TL
Combined with the "fake" isascii() definition found at:
which will override the system-defined isascii() if it is a function
rather than a macro, this makes ISBLANK(c) return true for any UTF-16
codepoint with 0x09 or 0x20 in the lower byte, regardless of its upper
byte. This means that the code to "trim trailing whitespace" at:
will also "trim" various other non-whitespace characters, such as
Latin-script ĉ and Ġ, Cyrillic Љ and Р, Devanagari ठ, Thai ภ and many more.
One workaround would be to replace the use of ISBLANK there with an
explicit test for the specific characters 0x09 and 0x20; but in case
there are other ISBLANK uses, I think it would be better to fix
If we really need to provide the isascii(c) macro here (I don't know
what other platforms/programs might break if we removed it), then I
propose making it at least somewhat more likely to be correct:
- #define isascii(c) 1
+ #define isascii(c) ((c) >= 0 && (c) <= 0x7f)
This will protect ISBLANK from passing non-ASCII values through to
isblank, and fixes the broken behavior in XeTeX.
(I don't have an SVN checkout set up for committing changes at the
moment, so I'm cc'ing Khaled and Peter here, and hope one of you can
take care of this promptly. Thanks!)
More information about the tex-live