[XeTeX] not enough \XeTeXcharclass registers

Bruno Le Floch blflatex at gmail.com
Mon Feb 1 14:11:36 CET 2016

On 2/1/16, David Carlisle <d.p.carlisle at gmail.com> wrote:
> On 1 February 2016 at 10:53, Jonathan Kew <jfkthame at gmail.com> wrote:
>> On 1/2/16 10:25, David Carlisle wrote:
>>> y.
>> You're right, of course; this is a limitation of the concept as currently
>> implemented.
>> In practice, I suppose I don't expect there to be all that many "generic
>> purposes" for which intercharclass is really a useful tool. For example,
>> it's hard to see how it could work well for bidi issues, because of the
>> problem of resolving neutral characters -- especially run-initial
>> neutrals.
> Yes I hesitated about using bidi as the example there, but couldn't think
> of many other generally applicable things:-)
>>> Would it be impossibly difficult to extend the concept so that a
>>> character takes a list of character classes so that you can classify
>>> characters in more than one way without needing impossibly many
>>> character classes to do that?
>> There would be two aspects to this: first, extending the character class
>> storage so as to allow a list rather than a single number. Currently,
>> it's
>> stashed in the upper part of the word where sfcode already lives, making
>> the implementation very simple and cheap.
>> And second, checking for the existence of a token list for the current
>> boundary would become significantly more expensive.
> Yes I suspected as much, perhaps it's a non starter. If the extended number
> as in the current test branch is "still cheap" then perhaps that's the way
> to go although character classes always seem like they are almost a
> solution to a problem but never quite powerful enough.
>> Currently, we just combine the two classes at the boundary to get a
>> single
>> 32-bit number, and do a simple lookup (in a sparse array) to see if
>> there's
>> anything defined. With class lists, we'd need to do this for each of the
>> classes in the two lists -- i.e. m * n sparse-array lookups. Or perhaps
>> go
>> at it from the other direction: iterate over a list of defined
>> transitions,
>> and check whether each of them applies.
> make sense.
>> Oh, and if there are multiple matches at a given boundary, what happens?
>> Using an imaginary extension to support lists:
>>   \XeTeXintercharclasses `A = { 1, 2 }
>>   \XeTeXintercharclasses `B = { 3, 4 }
>>   \XeTeXinterchartoks 1 3 = { foo }
>>   \XeTeXinterchartoks 1 4 = { bar }
>>   \XeTeXinterchartoks 2 3 = { xyzzy }
>>   \XeTeXinterchartoks 2 4 = { plugh }
>> What happens at the boundary in "AB"? Should it depend on the numerical
>> values of the classes, or the order in which the transitions were
>> specified, or what?
>> (I'm not saying the idea is a bad one; I can imagine it might be quite
>> useful. But I can also imagine it getting a bit hairy......)
>> JK
> Yes it certainly wasn't a fully worked proposal, but I thought it worth
> commenting while you were looking at that area of the code.
> David

Even with the current intercharclass one could write a package to
implement proposals such as David's, allowing whatever ordering of
transitions people want).  Such a package would define all transitions
to run the same code (including transitions with non-character
primitives), which would test the next token using \futurelet and save
its character code (or other info) in a global variable, say,
\lastchar.  At every step, one can use \lastchar and the next token to
decide what transition to do using whatever rules the package author
thinks of.  Major drawback: kerning is lost.

Note that this can be used to implement bidi too: just collect neutral
characters rather than leaving them right away in the output.

Not saying that's the right way to do it, but it could be made to work.


More information about the XeTeX mailing list