[tex-live] Critical bugfix needed for XeTeX [was: Re: Fwd: Duplicate Thai patterns reported by XeTeX in TL 2015 pretest]

Peter Breitenlohner peb at mppmu.mpg.de
Fri Apr 17 09:58:34 CEST 2015

On Thu, 16 Apr 2015, Jonathan Kew wrote:

> The problem reported with Thai patterns, as well as the Korean problem 
> mentioned by Dohyun Kim (incorrect \showthe output from a token list that 
> should contain a couple of Korean characters), turns out to be a symptom of 
> an input-processing bug.
> This is a critical issue inasmuch as it can result in silently discarding 
> certain characters from the user's input, with no indication that typesetting 
> has failed in any way.
> Any Unicode character whose low byte is 0x20 or 0x09 could be affected.
> The problem arises from the (unsigned char) typecasts added in TL revision 
> 34284:
> http://tug.org/svn/texlive/trunk/Build/source/texk/kpathsea/c-ctype.h?r1=34283&r2=34284&
> Combined with the "fake" isascii() definition found at:
> http://tug.org/svn/texlive/trunk/Build/source/texk/kpathsea/c-ctype.h?annotate=34284#l27
> which will override the system-defined isascii() if it is a function rather 
> than a macro, this makes ISBLANK(c) return true for any UTF-16 codepoint with 
> 0x09 or 0x20 in the lower byte, regardless of its upper byte. This means that 
> the code to "trim trailing whitespace" at:
> http://tug.org/svn/texlive/trunk/Build/source/texk/web2c/xetexdir/XeTeX_ext.c?annotate=36591#l454
> will also "trim" various other non-whitespace characters, such as 
> Latin-script ĉ and Ġ, Cyrillic Љ and Р, Devanagari ठ, Thai ภ and many more.
> One workaround would be to replace the use of ISBLANK there with an explicit 
> test for the specific characters 0x09 and 0x20; but in case there are other 
> ISBLANK uses, I think it would be better to fix kpathsea/c-ctype.h.
> If we really need to provide the isascii(c) macro here (I don't know what 
> other platforms/programs might break if we removed it), then I propose making 
> it at least somewhat more likely to be correct:

Hi Karl, Jonathan, Khaled,

this should really be fixed.  I have added  a test if iscascii is either
defined as macro or declared as function (or both) and only otherwise
   #define isascii(c) (((c) & ~0x7f) == 0)
(stolen from GNU libc).

Unfortunately this implies modifications in binaries for systems where
isascii is not defined as macro (supposedly Darwin and perhaps others).


More information about the tex-live mailing list