[tex-hyphen] Braimstorming about lualatex-hyphen in TeX Live

Mojca Miklavec mojca.miklavec.lists at gmail.com
Wed May 26 15:47:02 CEST 2010

On Tue, May 25, 2010 at 18:50, Manuel Pégourié-Gonnard wrote:
> - the main item is about generating a Xetex-specific language.dat,

1.) The question: do you agree with that at all? (Maybe Karl wants to
have a word at it as well.)
(and yes: we currently load zerohyph or maybe don't load anything at
all - it's not really important)

2.) I have tried to
    touch texlive/2010/texmf-var/tex/xetex/config/language.dat
but I don't understand why
    kpsewhich --engine xetex language.dat
still returns the original one.

3.) If we decide to go that way, do you agree to have something like
or maybe
or just a subset of them? (any syntax would be fine as long as it
would specify in which of the three families we want the patterns to
be used)

Even without the 8bit/xetex split for language.dat, it would still
offer valuable information and you could avoid defining language with
no value for luatex. (Or you could define it and declare it disabled
for that matter.)

4.) Also keep in mind that one might want to have a more "advanced"
version of patterns (for example Hungarian) in LuaTeX than in XeTeX.
How would that fit into your scheme?

> and other items are:
> - add a "comment" field (should not be a problem)

1.) Two comment fields actually (or maybe just one - "a", if you agree
with my point mentioned in "b").

1.a.) Consider the following:

% ushyphmax.tex, on the other hand, includes Gerard Kuiken's additional
% patterns; it is not frozen.
usenglishmax;  ushyphmax.tex
% FYI, ushyph.tex is Dr. Kuiken's smaller set of patterns; with today's
% large memories, there is no reason to use it, and we don't list it here.
% ushyph1.tex is another (historical) name for hyphen.tex.
% ushyph2.tex is another (historical) name for ushyph.tex.
% --karl

We would need to remove the comments completely unless some comment
field will exist.

1.b.) I would like to have a comment field in the final lua file for
since reason might be an "arbitrary complex string". What if you'll
ever need two specials and you'll accidentally have some commas or
colons inside "reason"? I would much prefer to have
    comment="Disabled due to blablabla."
in the language.dat.lua.

Though, from my point of view, I would not care about such languages
at all. You only need to know that the language is disabled since you
want to prevent loading it at format-generation time, right? Who cares
if the language simply remains undefined like any other unexistent
language? You don't really need to explain why it's not defined; or
rather; it should be enough to have just a normal lua comment; you
should not need to print out a reason when user requests the language
"ibycus". It simply won't exists. Do you think anyone will ever care?
In that case you don't need to support two different types of

> Am I missing something?

At format-generation time

1.) special="language0", -- should be dumped in the format

What do you do when you want another language to be dumped into format
(for whatever weird reason you might think of, see also nr. 2) and
don't want it to have the number 0 :) Or if some person that has no
overview will come and name another language "language0"?

Of course I didn't test anything, so I'm not sure about how this
works. Do you want/need a modification in hyphen-base or did you
"hardcode" that language into TL tools? (It would be nice to have a
statement in hyphen-base instead of hardcoding its generation.)

2.) Let's assume that the group of extra German hyphenation patterns
(or some third person) won't be willing to release a new version with
"plain patterns" by TL 2010 release date and that you will still want
the pattern to be loaded at format-generation time. What do you do in
such a case?

In reality we do need to do that at this very moment if you want the
language to be supported. You could have:
   mode="disabled" -- completely disabled (like ibycus)
   mode="format" -- dump into format at format-generation time
   mode="enabled" -- most languages
   (mode="empty" -- you could have that for arabic unless you prefer
some other method)

3.) What happens if somebody modifies language.dat.lua without
remaking the format? Where are the languages stored/do you store data
into format or read language.dat.lua at every luatex run?

1.) txtpatt, txthyph

What do you think about
    file => file_loader (or it may also stay at the old name)
    txtpatt => file_patterns
    txthyph => file_exceptions

2.) In case that there are no hyphenation exceptions, I can use
Would that make sense to you or do you prefer loading an empty file?

3.) For farsi/arabic one could also have
    file_patterns=nil (or file_patterns="empty" if you want :)
to signal that there are no exceptions and no patterns.

4.a.) Now that you have modified TL tools that are controlling
generation of language.dat.lua you can finally be sure that
language.dat should not contain any other language but the ones in
language.dat.lua, so why do you still want to read language.dat and
check if "maybe there was someone else who has added something to

4.b.) You still need to think about Akira (W32TeX) and CS (MikTeX),
even though MikTeX doesn't support LuaTeX yet. I would find it a
nightmare if you would require from MikTeX to mimick TL's on-the-fly
generation of language.dat.lua, if nothing else because there's no way
that you could control that.

5.) ibycus
luaspecial="disabled:only usable in 8bit engines"

We may keep luaspecial, but once all the other aspects are considered:
do we really need one with ibycus? My order of preference:
a) don't include it in language.dat.lua file at all (controlled with
engines= or enable_8bit=...,enable_utf/luatex/xetex)
b) mode="disabled" based on the same keywords (engines= or enable_x=true/false)
b) using special="disabled" based on the same keywords

Yes, I probably missed other points, but that should be enough for the
first round (I did not look into TeX code at all nor do I plan to do


More information about the tex-hyphen mailing list