[XeTeX] hyphenation in Ethiopian languages

Mojca Miklavec
Fri May 6 20:00:33 CEST 2011

Dear Jonathan,

Jonathan Kew wrote:
> On 6 May 2011, at 18:03, Mojca Miklavec wrote:
Adam McCollum wrote:
>>> Dear list members,
>>> I've recently drawn up a short document in Ge`ez (classical Ethiopic) using
>>> Polyglossia and I see that the hyphenation is wrong. As some of you know,
>>> languages that use the Ethiopic script, including Ge`ez and Amharic, place a
>>> word divider—it looks somewhat like a thick colon—between each word and two
>>> of these dividers side by side between sentences; see some Amharic examples
>>> here. That being the case, a word may be broken at any syllable (the script
>>> is a syllabary, not an alphabet) at the end of a line, but there is nothing
>>> corresponding to a hyphen. An additional matter of importance is that no
>>> line should begin with the single or double word divider. How should this be
>>> fixed?
>> Dear Adam,
>> We have submitted Ethiopic hyphenation patterns to CTAN (and TL) a
>> while ago, so once you update you TeX Live, it should work out of the
>> box.
>> ......
> For line-breaking after the word separators, doesn't it work to just set
>  \XeTeXlinebreaklocale "en"
>  \XeTeXlinebreakskip 0pt plus 1pt

Hm. Quite possible. None of us (or at least not me) knew about
linebreaklocale and linebreakskip, or at least didn't quite think of
them. We'll test, thanks a lot for the hint.

What exactly does \XeTeXlinebreaklocale "en" do? (After all, we need
breaking of Ethiopic text, not English one.) And where is "0pt plus
1pt" applied? Between all characters or just at the end? How is end of
line determined?

Interesting enough one of the first hits brings me back to "Word
wrapping in Lao":
which is also being heavily discussed off-list recently. We are
experiencing exactly the same problem there: too long lines to allow
the hyphenation algorithm to work properly.

We are aware of ICU, but nobody knows how to write ICU code even if
the algorithm is somewhat straightforward.

I hope to have Lao hyphenation patterns ready soon and then we will
try to apply some XeTeXinterchartoks-based breaks between letters that
always start or end a syllable, only hoping that there will be enough
of such letters to cut the remaining text into
shorter-than-64-character sequences.

Is there really no way to increase the limit for hyphenation in XeTeX
from 64 characters to something safer? LuaTeX sets the limit at 256.


