[XeTeX] \hyphenation{} and combining diacritics

Joshua and Amy josh.ruthamy at gmail.com
Fri Jul 8 22:00:42 CEST 2011

I'm creating some hyphenation rules for Jarai texts that I'm
interlinearizing. Here's the problem: In various texts, a complex character
such as LATIN SMALL LETTER A WITH BREVE might be encoded as a single code
point (U+0103) or as a combination of code points (LATIN SMALL LETTER A:
U+0061 plus COMBINING BREVE: U+0306). The \hyphenation{} command does not
treat the two things as the same, meaning that I have to create two versions
of a word if it has one accented character, four versions if it has two
accented character, nine versions if it has three, etc. For example:

\hyphenation{hơ-nuă hơ-nuă hơ-nuă hơ-nuă}

(because O WITH HORN can be two code points or one)

Is there a simple way to tell (Xe)LaTeX to treat precomposed and uncomposed
characters identically without having to put in all the possibilities?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://tug.org/pipermail/xetex/attachments/20110708/fda3dc85/attachment.html>

More information about the XeTeX mailing list