[tex-hyphen] ruhyphen and ot2

Tue Jun 13 01:14:41 CEST 2006

>  PS> ... \usepackage[OT2]{fontenc} ...
>  PS> \showhyphens{perevodyat}
> [...]
>  PS> \OT2/cmr/m/n/10 pe-re-vo-dy-at
>
>  PS> So I guess it is necessary to add patterns like y8a when using
>  PS> OT2?
>
> generally speaking, yes; but an attempt to do that will give error
> messages because such patterns will conflict with other entries in
> typical pattern sets (such as ruhyphal.tex).

OK. I first assumed I wouldn't be able to hyphenate the OT2-Russian,
but the text in ruhyphen.tex made me think it would be possible.
I suggest you add a warning about this right there.

> because of these "abuse" of the ligatures, it is best to avoid using
> the OT2-encoded fonts, because the hyphenation significantly suffers
> in this case. whenever possible, use the 8-bit encoded fonts (e.g. T2A).

I know... I'm helping a friend with a book (in Russian, but with lots
of quoted text in German, Dutch, Polish, Latin and several other
languages with Latin letters) which she already is writing in OT2 (and
T1). It may seem crazy, but she is used to that by now and has never
used any Cyrillic input method for her (Swedish) keyboard. Her main
complaint with OT2 is the unnecessary "ts" ligature for "c".

I have written a small program to preprocess the file to add "\-"s for
her according to a (small but growing) word hyphenation list, but then
I found the remark in ruhyphen.tex that made me think it would work
with the patterns.

Maybe I should add a preprocess stage that does a OT2->T2A conversion
instead, but that needs a bit of LaTeX parsing since there are several
commands and environments used in the file now that changes
fontencoding temporarily.

I don't know if this is OK on the tex-hyphen list. I wonder if there
are plans for the future how things like this ideally should work?

To me it seems like it would be more natural to set *inputenc* for
transliterated input like this instead. Wouldn't it be possible to
make it like that and then have the same hyphenation rules for
"native" cyrillic text and transliterated text? I don't know how this
really works, but I guess it wouldn't be that much different from the
utf8 inputenc where byte sequences also are made into one character.