[tex-hyphen] German patterns in TL2010 (was: How and where to generate language.dat.lua?)
Stephan Hennig
mailing_list at arcor.de
Mon May 24 18:33:34 CEST 2010
[
* starting a new thread,
* summing up the current state of German patterns,
* long
]
Am 03.05.2010 20:06, schrieb Manuel Pégourié-Gonnard:
> Le 03/05/2010 11:08, Mojca Miklavec a écrit :
>
>>> The down side is, when german-x is updated, hyph-utf8 needs to
>>> be updated too
>>
>> Whet german-x is updated, they'll probably want to update patterns
>> in hyph-utf8 anyway.
>>
> The actual patterns are in hyph-utf8? I was under the impression
> that dehyph-exptl was a separate package, both on CTAN and in TL.
Everything said is correct.
In theory, dehyph-exptl is a separate package intended for people eager
to play with the experimental patterns. Up until now, the package (the
pattern providers) were only aiming at 8-bit TeX, though.
Practically, the patterns are already part of hyph-utf8. During the
great pattern encoding normalisation efforts, Mojca decided to make our
patterns the default ones for German language in 16-bit engines. While
the idea conflating two backwards compatibility breaking events
(modified paragraph breaking in LuaTeX and new German patterns) into one
event only is great, this is a bit ahead of our original intention. I
at least, just haven't put too much thought into 16-bit engines so far.
For TeX Live 2009, the current state is this:
* Werner has converted the patterns to UTF-8. He also provided a
.tex pattern wrapper, that converts patterns back into T1
encoding, if an 8-bit engines is recognized. 8-bit TeX engines
require the dehyph-exptl package with their time-stamped patterns
to be installed for this. This package is already part of TL2009
and the patterns are enabled in language.dat, by default, making
them available via languages 'german-x-2009-06-19' etc.
* XeTeX loads the unmodified UTF-8 patterns provided by hyph-utf8,
which are the same as in dehyph-exptl, but uses its own pattern
wrapper, though. As long as the patterns for XeTeX and LuaTeX
aren't frozen, this means hyph-utf8 is updated whenever we provide
new patterns.
* As for LuaTeX, I have no idea what patterns LuaTeX from TL2009
actually loads.
Note, the pattern wrappers provided by dehyph-exptl and hyph-utf8 use
quite similar code, but do different things:
hyph-utf8: used for languages 'german' and 'ngerman'
if (engine == 8bit) then load traditional patterns
else load experimental patterns
No re-encoding in both cases.
dehyph-exptl: used for languages 'german-x-<date>' etc.
if (engine == 8bit) then re-encode patterns to T1 encoding
else load patterns unmodified
I have just one question about this procedure. Is the code for the
8/16-bit engines switch ok or are there better alternatives (\ifxetex
etc.)? This is from loadhyph-de-1996.tex:
> \begingroup
> % Test whether we received one or two arguments
> \def\testengine#1#2!{\def\secondarg{#2}}
> % That's Tau (as in Taco or ΤΕΧ, Tau-Epsilon-Chi), a 2-byte UTF-8 character
> \testengine Τ!\relax
> % Unicode-aware engine (such as XeTeX or LuaTeX) only sees a single (2-byte) argument
> \ifx\secondarg\empty
> \message{UTF-8 German Hyphenation Patterns (Reformed Orthography)}
> \input hyph-de-1996.tex
> \else
> \message{German Hyphenation Patterns (Reformed Orthography)}
> % Kept for the sake of backward compatibility, but newer and better patterns by WL are available.
> \input dehyphn.tex
> \fi
> \endgroup
For TeX Live 2010, I hope we can agree on the following goals:
* 8-bit TeX
No change (ever). Load traditional patterns in the format.
Experimental patterns are provided by package dehyph-exptl.
* XeTeX
No change. Load experimental patterns, by default. Make them
available as traditional languages 'german' and 'ngerman'.
* LuaTeX:
Load experimental patterns from new language.dat.lua, by default.
Make them available as traditional languages 'german' and
'ngerman'.
What does that mean for German patterns? Not much, fortunately:
* For 8-bit TeX, language.dat has correct entries for the following
languages:
german,
ngerman,
and
german-x-2009-06-19,
german-x-latest,
ngerman-x-2009-06-19,
ngerman-x-latest.
* Besides my question about the 8/16-bit switch from above, all is
well with XeTeX, as well.
* LuaTeX loads languages from language.dat.lua. A proper entry is
required there and a plain text version of the pattern files.
Since our patterns have been there in hyph-utf8, Mojca has already
done all the necessary work.
The question is, whether there should be an entry for languages
'german-x-<date>' etc. I'd say no and I'll emphasize in our
documentation, that package dehyph-exptl is not required for
LuaTeX (and XeTeX).
The only thing for us to do, is to remember to provide patterns as
text versions, too, in future releases. Did I miss something?
I'm sorry for the confusion about the state of German patterns. There
must have been some. At least, I have learnt much about pattern loading
during the last weeks. Comments and corrections are welcome (hence the
lengthy mail)!
Best regards,
Stephan Hennig
More information about the tex-hyphen
mailing list