There could be some subtle problems that simply changing the character count constant causes.

In particular, the allocation size of a "whatsit" language node might also need changing, which would require adjusting other code in the core engine that assumes a default small size for that language node sub-type of a "whatsit".

Or not.  I can't tell from the TeX source what the bit sizes of these node fields are.  But if they're too small to fit a pair of enhanced character count limits for hyphenation, there will likely be bugs elsewhere due to truncation or wraparound in the arithmetic.


Doug McKenna

Dear Jonathan,

yes, recompiling xetex is fine!

At 255 characters I still have 32 occurences left, at 500 two, and at
1000 zero. Thanks for looking into this!


On Tue, Mar 15, 2016 at 3:46 PM, Jonathan Kew <jfkthame at gmail.com> wrote:
> On 15/3/16 14:24, Peter Mukunda Pasedach wrote:
>> Dear XeTeX list,
>> I am dealing with a collection of texts in Sanskrit, for which the
>> builtin limitation of TeX to not perform hyphenation after the 63rd
>> character of a string is imposing a serious limitation, as such
>> strings do occur. One reason for this is that one can freely form very
>> long compounds, another one is sandhi, in which due to euphonic
>> changes ending and beginning vowels fuse, another one that in Indic
>> scripts if one word ends in a consonant and the next one starts with a
>> vowel they are written together, another reason can be that scribes
>> simply do not use spaces consistently. Thus in the collection of texts
>> that I'm working on, currently comprising of 37 files, strings of more
>> than 63 characters occur 1823 times.
>> Is this limitation of 63 characters just an odd remnant of the time
>> TeX was written in, then necessary because of hardware limitations, or
>> does it still make sense? Is there a reasonable way to remove it, or
>> set it significantly higher?
> I suspect (without actually checking the code) that it would be fairly
> trivial to make it significantly higher (less so to remove it entirely; but
> something like 255 or even 1000-plus would probably be simple).
> A change like this would need to be optional, however, so that the
> typesetting of existing documents does not change unless the user
> deliberately chooses the modified behavior.
> It's probably too late to be adding a new feature for the TL'16 release; are
> you prepared to recompile xetex yourself from source in order to make such a
> change?
> JK
