[XeTeX] (not) understanding XeTeXinterchartoks

Jonathan Kew jfkthame at gmail.com
Fri May 8 14:44:20 CEST 2015


On 8/5/15 12:54, David Carlisle wrote:
> On 8 May 2015 at 11:45, Adam Twardoch (List) <list.adam at twardoch.com> wrote:
>> In modern text processing (Unicode+OpenType), a text run is a series of characters with the same formatting (font, size, color etc.), directionality (ltr, rtl) and script (writing system such as Latin, Greek, Arabic or Gujarati).
>>
>
> er, yes, quite:-)
>
> so my question is why did this X trigger the 255 boundary for end of
> text run processing.
>

Well... it's a long time since I touched any of this, but let's see if 
we can figure it out. I don't suppose it's clearly spelled out anywhere 
just what such a "run" is, for interchartoks purposes.

 From looking at the code, it appears that xetex sets the "boundary" 
state at the beginning of an interchartoks-inserted token list; and if 
-- as in your example -- that token list doesn't contain any characters 
that cause the current state to change, then the following character 
will be treated as adjacent to that boundary. Which is why it then sees 
the "255 \Xclass" on (re-)encountering the X after processing the "0 
\Xclass" insertion.

So the lesson seems to be that if you're going to provide interchartoks 
for a <boundary><something> transition, whatever you insert had better 
cause a change in the current class -- i.e. insert an actual character 
of some kind -- otherwise you're headed for a loop. If what you want to 
do here doesn't involve inserting text, then you probably want to 
locally disable interchartoks processing. E.g. if you modify your 
example to say something like

   \def\zza{\begingroup\XeTeXinterchartokenstate=0 \futurelet\tmpa\zzza}
   \def\zzza#1{#1\show\tmpa\endgroup}

then you'll get \zza executed once, showing \tmpa as expected, but your 
\zzb never gets hit.

JK



More information about the XeTeX mailing list