[kadingira] Babel, LuaLaTeX and legacy documents with 8-bit font encodings

Sergei Golovan sgolovan at gmail.com
Thu Mar 30 16:30:05 CEST 2017


On Wed, Mar 29, 2017 at 5:15 PM, Javier Bezos <jbezosl at gmail.com> wrote:
>
>> Ideally, just replacing pdflatex by lualatex should work without any
>> changes in the code.
>
> Ideally, but that won't happen. At least you must remove inputenc.

I can replace it by luainputenc, which works fine with pdflatex as well,
so this change isn't too disruptive for me.

>
>> Currently, Babel always loads hyphenation patterns in UTF-8 encoding,
>> which isn't always correct.
>
> Actually in LuaTeX it pre-loads nothing (well, english, just for
> compatibility). It's up to you to define which patterns will be load
> with a language.dat file (there is a default one, but the mechanism
> was devised to allow multiple configurations, so they are not
> exactly «pathed» files). By default, in Unicode engines only
> Unicode patterns are loaded, which seems sensible.

I understand that it doesn't pre-load patterns for languages other than
English, which is good because as for now different hyphenation
patterns are required for documents with different font encodings
in use (T2A vs TU for Russian, for example). And unfortunately, there's
no way to load patterns in T2A encoding without local modification
of language.dat or loadhyph-ru.tex.

>
> A way to load safely a set of patters from within the document itself
> (and even combine them) is in my 'todo' list, but priority is not
> very high.

I'm not sure that's necessary. The current situation when the patterns are
loaded together with their languages looks fine. The problem is with
the font encodings (and only for legacy font encodings).

As far as I understand, currently there are two ways of loading the
hyphenation patterns:

1) Use language.dat.lua which contains a hash of hyphenation info
for all installed languages (the hash entries list the loader file and
patterns in text format, I guess they aren't supposed to be used
together).

2) Use language.dat which is a config file with the loader file names
(approach currently employed in Babel).

Both ways don't let the luatex engine load the patterns in different
encodings. So some changes in the hyph-utf8 package are required
to make that possible. I can think of the following:

1) Adding some code to the pattern loaders like loadhyph-ru.tex
which would check if a legacy font encoding is in use and loaded
the corresponding patterns. I can't cay that I like this idea very much
despite I've implemented it locally. As far as I understand,
loadhyph-ru.tex is designed for preloading patterns, so it'd be better
to keep it as simple as possible.

2) Augment language hashes in language.dat.lua by a field which
would point to files with 8-bit patterns (like hyph-ru.t2a.tex initially
designed for pTeX).

3) Augment language hashes in language.dat.lua by a field or fields
which would specify a font encoding. After that, load the patterns
(like hyph-ru.pat.txt) but reencoded on the fly. This way, additional
encoding files have to be installed with hyph-utf8.

4) May be some other ideas?

The problem with approaches 2 and 3 is that it'd require rewriting
the patterns loading code in Babel.

What do you think about all of this? Is it worth to do something
about legacy font encodings? Or may be just abandon them and
use only with the old engines?

Cheers!

-- 
Sergei Golovan



More information about the kadingira mailing list