[XeTeX] not enough \XeTeXcharclass registers

David Carlisle d.p.carlisle at gmail.com
Mon Feb 1 13:09:25 CET 2016


On 1 February 2016 at 10:53, Jonathan Kew <jfkthame at gmail.com> wrote:

> On 1/2/16 10:25, David Carlisle wrote:
>
>> y.
>>
>
> You're right, of course; this is a limitation of the concept as currently
> implemented.
>
> In practice, I suppose I don't expect there to be all that many "generic
> purposes" for which intercharclass is really a useful tool. For example,
> it's hard to see how it could work well for bidi issues, because of the
> problem of resolving neutral characters -- especially run-initial neutrals.
>

Yes I hesitated about using bidi as the example there, but couldn't think
of many other generally applicable things:-)

>
>
>> Would it be impossibly difficult to extend the concept so that a
>> character takes a list of character classes so that you can classify
>> characters in more than one way without needing impossibly many
>> character classes to do that?
>>
>
> There would be two aspects to this: first, extending the character class
> storage so as to allow a list rather than a single number. Currently, it's
> stashed in the upper part of the word where sfcode already lives, making
> the implementation very simple and cheap.
>
> And second, checking for the existence of a token list for the current
> boundary would become significantly more expensive.


Yes I suspected as much, perhaps it's a non starter. If the extended number
as in the current test branch is "still cheap" then perhaps that's the way
to go although character classes always seem like they are almost a
solution to a problem but never quite powerful enough.


> Currently, we just combine the two classes at the boundary to get a single
> 32-bit number, and do a simple lookup (in a sparse array) to see if there's
> anything defined. With class lists, we'd need to do this for each of the
> classes in the two lists -- i.e. m * n sparse-array lookups. Or perhaps go
> at it from the other direction: iterate over a list of defined transitions,
> and check whether each of them applies.
>

make sense.


>
> Oh, and if there are multiple matches at a given boundary, what happens?
> Using an imaginary extension to support lists:
>
>   \XeTeXintercharclasses `A = { 1, 2 }
>   \XeTeXintercharclasses `B = { 3, 4 }
>
>   \XeTeXinterchartoks 1 3 = { foo }
>   \XeTeXinterchartoks 1 4 = { bar }
>   \XeTeXinterchartoks 2 3 = { xyzzy }
>   \XeTeXinterchartoks 2 4 = { plugh }
>
> What happens at the boundary in "AB"? Should it depend on the numerical
> values of the classes, or the order in which the transitions were
> specified, or what?
>
> (I'm not saying the idea is a bad one; I can imagine it might be quite
> useful. But I can also imagine it getting a bit hairy......)


Yes it certainly wasn't a fully worked proposal, but I thought it worth
commenting while you were looking at that area of the code.



>
>
> JK
>
>
David
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://tug.org/pipermail/xetex/attachments/20160201/b645fe6b/attachment.html>


More information about the XeTeX mailing list