[tex-hyphen] Latin Hyphenation when using utf8

Arthur Reutenauer arthur.reutenauer at normalesup.org
Tue Jun 22 16:09:40 CEST 2010

>                                                             Arthur, if
> I'm wrong about the macrons, please correct me. I didn't really check
> it.

  The L7X encoding for Baltic languages has all the vowels with macron
because they're used in Latvian, but I'll doubt you'll find any font
encoding for vowels with breves, except for 'a' that is used in
Romanian (and thus appears in EC, for example).

  But the advice to use UTF-8 encoding and XeTeX is sound: it makes
little sense to spend time on hyphenating Latin with macrons and breves
using some 8-bit encoding, when using Unicode solves the problem
straight away (once we have the additional patterns, that is, but that
shouldn't be too difficult).

> 3.) This is the answer that we got from Petr Sojka (for some unrelated problem):
>> \hyphequiv table might be the right way of doing that (it was suggested
>> some 13 years ago) to make patterns independent of font encodings.
>> Anyway, for the purpose of unifying char positions just for hyphenation
>> and not lowercasing, one can use etex's \savinghyphcodes macro:
>> see sec. 3.10 of http://www.tug.org/teTeX/texmf-dist/doc/etex/base/etex-man.pdf

  But \hyphequiv has been dropped from LuaTeX, and XeTeX doesn't seem to
know about it either.  I wonder if the best solution in that case is not
simply to generate two files, one for UTF-8 engines with all the
patterns (including macrons, etc.), and one for 8-bit engines where all
patterns containing characters that have no equivalent in the 8-bit
encoding have been stripped.


More information about the tex-hyphen mailing list