[kadingira] Babel, LuaLaTeX and legacy documents with 8-bit font encodings

Sergei Golovan sgolovan at nes.ru
Tue Mar 28 09:19:38 CEST 2017


Hi!

Sometimes I want to compile some legacy documents which use 8-bit font
encodings (and might even be in 8-bit input encodings), and I'd like
to do that using LusLaTeX. So the actual code which defines the
encodings and language would become

\usepackage[T2A]{fontenc}
\usepackage[utf8]{luainputenc}
\usepackage[russian]{babel}

In fact, even in new code I may want to do that because some fonts I'd
like to use don't have OTF versions.

There are two issues with this approach:

1) Babel loads the Russian hyphenation patterns dynamically, but they
are in UTF-8 encoding, hence they are unusable since the font encoding
is T2A. If I understand correctly, Babel does load the hyphenation
patterns via analyzing language.dat and sourcing loadhyph-ru.tex,
which in turn determines that the LuaLaTeX engine is a unicode one,
and loads hyph-ru.tex with the UTF-8 encoded patterns.

To address this I've locally patched the loadhyph-ru.tex script and
added some code which uses hyph-ru.t2a.tex if \encodingdefault is T2A
(see https://github.com/hyphenation/tex-hyphen/issues/7 for details).

2) After that another issue appears: LuaLaTeX assigns \lccode's
according to the UTF-8 encoding, so a few characters which are Russian
letters in T2A become non-letters or have incorrect \lccode (they are
"BC (\cyryo), "F7 (\cyrch), "9C (\CYRYO), "D7 (\CYRCH), "DF (\CYRYA)).

So I have two questions: 1) What we can do to load the correct
hyphenation patterns? (Probably, it'd be better not to look into
language.dat, since LuaTeX is flexible enough and can load any
patterns required.)
2) Which package should be responsible for assigning \lccode's for
different font encodings?

Cheers!
-- 
Sergei Golovan


More information about the kadingira mailing list