[tex-hyphen] How and where to generate language.dat.lua?

Manuel Pégourié-Gonnard mpg at elzevir.fr
Mon May 3 20:06:07 CEST 2010

Le 03/05/2010 11:08, Mojca Miklavec a écrit :
> I agree with all that.

>> The down side is,
>> when german-x is updated, hyph-utf8 needs to be updated too
> Whet german-x is updated, they'll probably want to update patterns in
> hyph-utf8 anyway.
The actual patterns are in hyph-utf8? I was under the impression that
dehyph-exptl was a separate package, both on CTAN and in TL.

>> Another possibility is to handle language.dat.lua in the same way we handle
>> language.{dat,def} in TL currently. It would only require new (optional)
>> attributes for the AddHyphen postaction, and the code to handle it of course.
>> Pro: more modular and scalable. Con: needs coding.
> It's an option, but there's another big con: whenever you'll want some
> change, you'll have to update tlmgr. I don't think that this is such a
> great idea.
Well, I don't think the format of language.dat is very likely to evolve, it
currently contains all the required information. But that's a plus for a unique
"special" field: we can have new types of specials without changing the file format.

> I have some comments about special=<...> It's a bit ugly in my
> opinion. I would use
>     comment=":some,arbitrary#comment%" (not so extreme of course)
> and rather additional fields than "special".

I don't have any objection to comment = <arbitrary string>, but I don't find
special ugly.

> In particular when the
> number of options is limited anyway. There is no need to add option to
> all the languages. You may have optional options and then use
> something like
>     "empty_patterns"=true
> only for farsi, arabic and zerohyph (could be other option names). The
> "special" field seems a bit ugly to me.
One point: the three types of special are mutually exclusive, so it seems
logical to have only on field for that. Imagine an entry with
	disabled = true,
	empty_patterns = true,
how should the code handle it?

Another point is, it may make future evolution easier, see above.

> The third option would be if your lua scripts would read a database
> file from german-x.

The idea occurred to me, but I didn't write it down at once so I forgot about
it. I just checked the technical point: luatex's kpse library allows to find all
the files with a given name, since 0.51 (and I don't see any reason to try
supporting older versions of LuaTeX here). So, it is possible to read all
language.dat.lua files and merge them (complaining if two of them try to define
an entry for the same language), so that every package shipping patterns could
also ship its own bit of language.dat.lua.

> In my opinion that would be best in the long run.

I agree. I'll write the code and tests in that way if everybody else agrees (or
if nobody disagrees in, say, the next two days).

> in the long run I don't think that that database
> really belongs to hyph-utf8.

That's my opinion too.

> But that may be changed later. If we need
> to support two more files, we can add them, but authors of German
> patterns might want a different approach (even better luatex support)
> at some point anyway.
Well, better support will come at some point, but afaics, the current scheme is
perfectly compatible with future enhancements, which could happen with only
changes on the macro/lua side, reusing the architecture we're developping. (I
actually have ideas for this, but now is not the time.)


More information about the tex-hyphen mailing list