[tex-live] Critical bugfix needed for XeTeX [was: Re: Fwd: Duplicate Thai patterns reported by XeTeX in TL 2015 pretest]

Jonathan Kew jfkthame at gmail.com
Thu Apr 16 15:18:45 CEST 2015


The problem reported with Thai patterns, as well as the Korean problem 
mentioned by Dohyun Kim (incorrect \showthe output from a token list 
that should contain a couple of Korean characters), turns out to be a 
symptom of an input-processing bug.

This is a critical issue inasmuch as it can result in silently 
discarding certain characters from the user's input, with no indication 
that typesetting has failed in any way.

Any Unicode character whose low byte is 0x20 or 0x09 could be affected.

The problem arises from the (unsigned char) typecasts added in TL 
revision 34284:

http://tug.org/svn/texlive/trunk/Build/source/texk/kpathsea/c-ctype.h?r1=34283&r2=34284&

Combined with the "fake" isascii() definition found at:

http://tug.org/svn/texlive/trunk/Build/source/texk/kpathsea/c-ctype.h?annotate=34284#l27

which will override the system-defined isascii() if it is a function 
rather than a macro, this makes ISBLANK(c) return true for any UTF-16 
codepoint with 0x09 or 0x20 in the lower byte, regardless of its upper 
byte. This means that the code to "trim trailing whitespace" at:

http://tug.org/svn/texlive/trunk/Build/source/texk/web2c/xetexdir/XeTeX_ext.c?annotate=36591#l454

will also "trim" various other non-whitespace characters, such as 
Latin-script ĉ and Ġ, Cyrillic Љ and Р, Devanagari ठ, Thai ภ and many more.

One workaround would be to replace the use of ISBLANK there with an 
explicit test for the specific characters 0x09 and 0x20; but in case 
there are other ISBLANK uses, I think it would be better to fix 
kpathsea/c-ctype.h.

If we really need to provide the isascii(c) macro here (I don't know 
what other platforms/programs might break if we removed it), then I 
propose making it at least somewhat more likely to be correct:

- #define isascii(c) 1
+ #define isascii(c) ((c) >= 0 && (c) <= 0x7f)

This will protect ISBLANK from passing non-ASCII values through to 
isblank, and fixes the broken behavior in XeTeX.

(I don't have an SVN checkout set up for committing changes at the 
moment, so I'm cc'ing Khaled and Peter here, and hope one of you can 
take care of this promptly. Thanks!)

JK



More information about the tex-live mailing list