[XeTeX] wildcard interchar class?

Jonathan Kew jfkthame at googlemail.com
Thu Oct 1 01:08:19 CEST 2009


On 29 Sep 2009, at 20:12, Michiel Kamermans wrote:

> Hi all,
>
> would it be hard to add a "wildcard" intercharclass to XeTeX, to  
> allow signalling 'going to' and 'leaving' character classes? I was  
> thinking along the lines of a special class lets you say:
>
> \XeTeXinterchartoks * 4 = {\classstartcommand{something}{or}{other}}
> \XeTeXinterchartoks 4 * = {\classendcommand{something}{or}{other}}
>
> This would simplify those setups where the change behaviour is  
> dictated not so much by which transition occurs, but by which class  
> is encountered, and would remove the need for adding transition  
> rules for every class already in the preamble when a new character  
> class gets added.
>
> In terms of precedence, the specification order could be used to  
> determine transition behaviour, with the wildcard binding  
> pragmatically needing to be listed first, followed by specific inter- 
> class transition rules, so that the two never conflict: either a  
> transition is caught by two wildcard rules (the end of the current  
> class, and the start of the next), or there's an explicit rule that  
> overrides the default behaviour.
>
> I don't know how hard it would be to add this, but this  
> functionality would seem like a desirable feature to have in XeTeX  
> because it doesn't interfere with the way things work right now, but  
> would greatly reduce the need for transition rules whens only start/ 
> end rules are needed; something which even for the five default  
> classes (0, 1, 2, 3, 255) would reduce the number of required  
> transition rules from 8 per class (increasing linear as 2n-2, with  
> total rule count being the polynomial n(2n-2)) to only 2 (a fixed  
> number irrespective of class count, with total rule count being the  
> linear 2n).

Yes, I can imagine this being pretty useful.

In terms of implementation, the simplest thing would be to reserve an  
additional "magic" value (255 is already a special case, representing  
the boundary of a character run). I'm assuming nobody *really* needs  
all the remaining classes from 0..254 for real character classes, so  
it should be OK to define 254 as "wildcard".

(I realize that having a special wildcard token such as *, and not  
reserving magic numbers, would appear tidier. But this should all be  
hidden away behind macros anyhow. We'll just need to limit the  
\newXeTeXintercharclass allocator so that it refuses to allocate  
values beyond 254.)

I'll try to look into doing this sometime soon; unfortunately, it's  
too late to add features for the TL'09 release.

JK



More information about the XeTeX mailing list