[XeTeX] Hyphenation of strings of more than 63 characters

Doug McKenna doug at mathemaesthetics.com
Tue Mar 15 19:04:30 CET 2016

There could be some subtle problems that simply changing the character count constant causes.

In particular, the allocation size of a "whatsit" language node might also need changing, which would require adjusting other code in the core engine that assumes a default small size for that language node sub-type of a "whatsit".

Or not.  I can't tell from the TeX source what the bit sizes of these node fields are.  But if they're too small to fit a pair of enhanced character count limits for hyphenation, there will likely be bugs elsewhere due to truncation or wraparound in the arithmetic.


Doug McKenna

----- Original Message -----
From: "Peter Mukunda Pasedach" <peter.pasedach at googlemail.com>
To: "XeTeX (Unicode-based TeX) discussion." <xetex at tug.org>
Sent: Tuesday, March 15, 2016 9:13:08 AM
Subject: Re: [XeTeX] Hyphenation of strings of more than 63 characters

Dear Jonathan,

yes, recompiling xetex is fine!

At 255 characters I still have 32 occurences left, at 500 two, and at
1000 zero. Thanks for looking into this!


On Tue, Mar 15, 2016 at 3:46 PM, Jonathan Kew <jfkthame at gmail.com> wrote:
> On 15/3/16 14:24, Peter Mukunda Pasedach wrote:
>> Dear XeTeX list,
>> I am dealing with a collection of texts in Sanskrit, for which the
>> builtin limitation of TeX to not perform hyphenation after the 63rd
>> character of a string is imposing a serious limitation, as such
>> strings do occur. One reason for this is that one can freely form very
>> long compounds, another one is sandhi, in which due to euphonic
>> changes ending and beginning vowels fuse, another one that in Indic
>> scripts if one word ends in a consonant and the next one starts with a
>> vowel they are written together, another reason can be that scribes
>> simply do not use spaces consistently. Thus in the collection of texts
>> that I'm working on, currently comprising of 37 files, strings of more
>> than 63 characters occur 1823 times.
>> Is this limitation of 63 characters just an odd remnant of the time
>> TeX was written in, then necessary because of hardware limitations, or
>> does it still make sense? Is there a reasonable way to remove it, or
>> set it significantly higher?
> I suspect (without actually checking the code) that it would be fairly
> trivial to make it significantly higher (less so to remove it entirely; but
> something like 255 or even 1000-plus would probably be simple).
> A change like this would need to be optional, however, so that the
> typesetting of existing documents does not change unless the user
> deliberately chooses the modified behavior.
> It's probably too late to be adding a new feature for the TL'16 release; are
> you prepared to recompile xetex yourself from source in order to make such a
> change?
> JK
> --------------------------------------------------
> Subscriptions, Archive, and List information, etc.:
>  http://tug.org/mailman/listinfo/xetex

Subscriptions, Archive, and List information, etc.:

More information about the XeTeX mailing list