[tex-hyphen] Help with UTF-8 Language
mojca.miklavec.lists at gmail.com
Thu Oct 9 22:43:15 CEST 2014
On Fri, Oct 10, 2014 at 12:08 AM, Philip Taylor <P.Taylor at rhul.ac.uk> wrote:
> Werner LEMBERG wrote:
>> I think XeTeX doesn't offer this level of manipulation. However, I
>> don't know how much it has inherited from Omega.
> Nor I. Let's hope Jonathan or another well-informed person can clarify
> this point.
>> `opatgen' should be able to do that. Unfortunately, it's not
>> maintained, and its C++ code is not compilable with today's
>> compilers. I guess that an experienced C++ programmer can fix this
>> rather easily. As usual, we need to find such a person who has time
>> and interest to do this...
> So all of the work that the UTF-8 TeX hyphenation group (Mojca, Arthur,
> ...) have done so far is therefore presumably only for languages/writing
> systems with fewer than 243 distinct characters; I had not realised that.
No, the patterns should work just fine with a large alphabet. (But
last time I checked XeTeX had problems hyphenating long words.)
1.) Patterns with > 243 different characters are not usable in
original TeX or pdfTeX, but I'm not aware of that limitation in XeTeX.
2.) The fact that "patgen" is limited, "opatgen" is defunct and nobody
else stepped up yet to create a new tool in some modern programming
language with built-in Unicode support (or with some modern C(++)
libraries) has nothing to do with XeTeX's ability to interpret
patterns. If we don't have a tool that can generate patterns for large
alphabets that doesn't mean that XeTeX cannot handle such patterns.
But that's all off-topic.
The patterns can probably be created with patgen with some ugly
tweaking (as Jonathan suggested).
More information about the tex-hyphen