[tex-hyphen] How and where to generate language.dat.lua?

Manuel Pégourié-Gonnard mpg at elzevir.fr
Mon May 3 03:34:01 CEST 2010


Now that the format and meaning of language.dat.lua seems stable, it's probably
time to decide how to handle it. Here is a brief summary of the situation:

1. For maximal safety, we use a plain text version of patterns and exceptions
for dynamic loading; for this we need information not contained in
language.(dat|def), namely the nae of the files containing those versions.
2. This information is looked for in a file language.dat.lua. (a) Languages
without entry in this file are dumped in the format the plain old way (Knuthian
usenglish is always dumped as \language0). (b) It is possible to disable dynamic
loading (hence loading at all) of particular language via entries in this file.
3. Currently, languages loadable are (a subset of) those declared in
4. But in the future, one can imagine languages having an entry language.dat.lua
only, hence being only dynamically loadable in LuaTeX (macro support yet to be
written, but I have ideas for that, should not be difficult now) without being
dumped in other (non-LuaTeX-based) formats.

A special property of language.dat.lua compared to the other files is, it never
hurts to have more languages in this file than the user wants activated, due to
point 3 (and the new dynamic mechanism). An incomplete language.dat.lua doesn't
hurt too much either, due to 2a.

Now, to the best of my knowledge, entries in language.{dat,def} basically come
from three souces:
(a) package hyphen-base
(b) tex-hyphen (hyph-utf8)
(c) german-x

Since the first one is basically frozen, and (b) and (c) are very cooperative,
it is probably possible to use a monolithic language.dat.lua, shipped with
hyph-utf8, using information from their repository and from german-x. The up
side is, there is no need to change anything on the TL side. The down side is,
when german-x is updated, hyph-utf8 needs to be updated too, and it'll become
more compilcated if there ever is more actors.

Another possibility is to handle language.dat.lua in the same way we handle
language.{dat,def} in TL currently. It would only require new (optional)
attributes for the AddHyphen postaction, and the code to handle it of course.
Pro: more modular and scalable. Con: needs coding.

New attributes would be: patterns=<file with plain text patterns>,
hyphenation=<file with plain text exceptions>, special=<code for special
languages> (optional), and something to dtermine if the language should go to
language.{dat,def} only, language.dat.lua only, or both.

An intermediate possibility is to use a monolithic language.dat.lua for now,
since it is readily available, and implement the more modular option later.
(Pro: nothing to do now, con: now would be the best moment for me to implement
that, since later I'll have to remember things first.)



More information about the tex-hyphen mailing list