[tex-hyphen] How and where to generate language.dat.lua?

Mojca Miklavec mojca.miklavec.lists at gmail.com
Mon May 3 11:08:25 CEST 2010

On Mon, May 3, 2010 at 03:34, Manuel Pégourié-Gonnard wrote:
> Hi,
> Now that the format and meaning of language.dat.lua seems stable, it's probably
> time to decide how to handle it. Here is a brief summary of the situation:
> 1. For maximal safety, we use a plain text version of patterns and exceptions
> for dynamic loading; for this we need information not contained in
> language.(dat|def), namely the nae of the files containing those versions.
> 2. This information is looked for in a file language.dat.lua. (a) Languages
> without entry in this file are dumped in the format the plain old way (Knuthian
> usenglish is always dumped as \language0). (b) It is possible to disable dynamic
> loading (hence loading at all) of particular language via entries in this file.
> 3. Currently, languages loadable are (a subset of) those declared in
> language.{dat,def}.
> 4. But in the future, one can imagine languages having an entry language.dat.lua
> only, hence being only dynamically loadable in LuaTeX (macro support yet to be
> written, but I have ideas for that, should not be difficult now) without being
> dumped in other (non-LuaTeX-based) formats.

I agree with all that.

> Now, to the best of my knowledge, entries in language.{dat,def} basically come
> from three souces:
> (a) package hyphen-base
> (b) tex-hyphen (hyph-utf8)
> (c) german-x
> The down side is,
> when german-x is updated, hyph-utf8 needs to be updated too

Whet german-x is updated, they'll probably want to update patterns in
hyph-utf8 anyway.

> Another possibility is to handle language.dat.lua in the same way we handle
> language.{dat,def} in TL currently. It would only require new (optional)
> attributes for the AddHyphen postaction, and the code to handle it of course.
> Pro: more modular and scalable. Con: needs coding.

It's an option, but there's another big con: whenever you'll want some
change, you'll have to update tlmgr. I don't think that this is such a
great idea.

> New attributes would be: patterns=<file with plain text patterns>,
> hyphenation=<file with plain text exceptions>, special=<code for special
> languages> (optional), and something to determine if the language should go to
> language.{dat,def} only, language.dat.lua only, or both.

I have some comments about special=<...> It's a bit ugly in my
opinion. I would use
    comment=":some,arbitrary#comment%" (not so extreme of course)
and rather additional fields than "special". In particular when the
number of options is limited anyway. There is no need to add option to
all the languages. You may have optional options and then use
something like
only for farsi, arabic and zerohyph (could be other option names). The
"special" field seems a bit ugly to me.

> An intermediate possibility is to use a monolithic language.dat.lua for now,
> since it is readily available, and implement the more modular option later.
> (Pro: nothing to do now, con: now would be the best moment for me to implement
> that, since later I'll have to remember things first.)

The third option would be if your lua scripts would read a database
file from german-x. In my opinion that would be best in the long run.
In the short term we can add a few more definitions to current
language.dat.lua, but in the long run I don't think that that database
really belongs to hyph-utf8. But that may be changed later. If we need
to support two more files, we can add them, but authors of German
patterns might want a different approach (even better luatex support)
at some point anyway.


More information about the tex-hyphen mailing list