[XeTeX] How do mapping files affect hyphenation?

Zdenek Wagner zdenek.wagner at gmail.com
Fri Feb 24 14:20:51 CET 2012

2012/2/24 Arthur Reutenauer <arthur.reutenauer at normalesup.org>:
>  I don’t know the technical answer to that question, but considering
> what you say:
>>                I would have expected the translittered text to be
>> hyphenated according the original russian rules but actually it is
>> not hyphenated at all:
>  That hints that the text is hyphenated after transliteration, using

It should be so. I have not study the XeTeX source code but it seems
to me that TEXkit transliteration is applied after all macro
expansions were finished and the characters enter the horizontal list.
The horizontal list is then subject to paragraph breaking algorithm.
The characters after thransliteration will usually have different
width and even a number of characters may vary (щ -> shch), thus the
paragraph breaking algorithm must use the transliterated text. One may
assume that it would be sufficient to take the cyrillic parrerns,
transliterate them and append. However, this will not work. Both тс
and ц are transliterated as ts, thus using this simplistic approach
tranliterated ц may be hyphenated in the middle. It will be necessary
to take the original list of hyphenated words, transliterate them and
feed this list to patgen.

> Russian hyphenation rules, and these obviously don’t how to hyphenate
> words written using the Latin script.  That doesn’t surprise me all that
> much.  Indeed, transliteration rules could map sequences of characters
> to one single character in the output (unlikely for Russian,
> admittedly); and if the hyphenation patterns command a break in the
> middle of the original characters, where should the transliterated text
> be hyphenated?
>  Actually, that situation isn’t so unlikely: I could imagine a
> transliteration system that would for example map “кс” to “x” (as in
> Александер), and it’s entirely possible to have a breakpoint between ‘к’
> and ‘с’ (I didn’t check if that’s the case in the default patterns).
>        Arthur
> --------------------------------------------------
> Subscriptions, Archive, and List information, etc.:
>  http://tug.org/mailman/listinfo/xetex

Zdeněk Wagner

More information about the XeTeX mailing list