[tex-hyphen] German patterns in TL2010 (was: How and where to generate language.dat.lua?)

Stephan Hennig mailing_list at arcor.de
Mon May 24 18:33:34 CEST 2010


[
    * starting a new thread,
    * summing up the current state of German patterns,
    * long
]

Am 03.05.2010 20:06, schrieb Manuel Pégourié-Gonnard:
> Le 03/05/2010 11:08, Mojca Miklavec a écrit :
>
>>> The down side is, when german-x is updated, hyph-utf8 needs to
>>> be updated too
>>
>> Whet german-x is updated, they'll probably want to update patterns
>> in hyph-utf8 anyway.
>>
> The actual patterns are in hyph-utf8? I was under the impression
> that dehyph-exptl was a separate package, both on CTAN and in TL.

Everything said is correct.

In theory, dehyph-exptl is a separate package intended for people eager 
to play with the experimental patterns.  Up until now, the package (the 
pattern providers) were only aiming at 8-bit TeX, though.

Practically, the patterns are already part of hyph-utf8.  During the 
great pattern encoding normalisation efforts, Mojca decided to make our 
patterns the default ones for German language in 16-bit engines.  While 
the idea conflating two backwards compatibility breaking events 
(modified paragraph breaking in LuaTeX and new German patterns) into one 
event only is great, this is a bit ahead of our original intention.  I 
at least, just haven't put too much thought into 16-bit engines so far.


For TeX Live 2009, the current state is this:

    * Werner has converted the patterns to UTF-8.  He also provided a
      .tex pattern wrapper, that converts patterns back into T1
      encoding, if an 8-bit engines is recognized.  8-bit TeX engines
      require the dehyph-exptl package with their time-stamped patterns
      to be installed for this.  This package is already part of TL2009
      and the patterns are enabled in language.dat, by default, making
      them available via languages 'german-x-2009-06-19' etc.

    * XeTeX loads the unmodified UTF-8 patterns provided by hyph-utf8,
      which are the same as in dehyph-exptl, but uses its own pattern
      wrapper, though.  As long as the patterns for XeTeX and LuaTeX
      aren't frozen, this means hyph-utf8 is updated whenever we provide
      new patterns.

    * As for LuaTeX, I have no idea what patterns LuaTeX from TL2009
      actually loads.

Note, the pattern wrappers provided by dehyph-exptl and hyph-utf8 use 
quite similar code, but do different things:

    hyph-utf8:  used for languages 'german' and 'ngerman'

      if (engine == 8bit) then load traditional patterns
      else load experimental patterns

      No re-encoding in both cases.

    dehyph-exptl:  used for languages 'german-x-<date>' etc.

      if (engine == 8bit) then re-encode patterns to T1 encoding
      else load patterns unmodified

I have just one question about this procedure.  Is the code for the 
8/16-bit engines switch ok or are there better alternatives (\ifxetex 
etc.)?  This is from loadhyph-de-1996.tex:

> \begingroup
> % Test whether we received one or two arguments
> \def\testengine#1#2!{\def\secondarg{#2}}
> % That's Tau (as in Taco or ΤΕΧ, Tau-Epsilon-Chi), a 2-byte UTF-8 character
> \testengine Τ!\relax
> % Unicode-aware engine (such as XeTeX or LuaTeX) only sees a single (2-byte) argument
> \ifx\secondarg\empty
>     \message{UTF-8 German Hyphenation Patterns (Reformed Orthography)}
>     \input hyph-de-1996.tex
> \else
>     \message{German Hyphenation Patterns (Reformed Orthography)}
>     % Kept for the sake of backward compatibility, but newer and better patterns by WL are available.
>     \input dehyphn.tex
> \fi
> \endgroup


For TeX Live 2010, I hope we can agree on the following goals:

    * 8-bit TeX
        No change (ever).  Load traditional patterns in the format.
        Experimental patterns are provided by package dehyph-exptl.

    * XeTeX
        No change.  Load experimental patterns, by default.  Make them
        available as traditional languages 'german' and 'ngerman'.

    * LuaTeX:
        Load experimental patterns from new language.dat.lua, by default.
        Make them available as traditional languages 'german' and
        'ngerman'.


What does that mean for German patterns?  Not much, fortunately:

    * For 8-bit TeX, language.dat has correct entries for the following
      languages:

         german,
         ngerman,

         and

         german-x-2009-06-19,
         german-x-latest,
         ngerman-x-2009-06-19,
         ngerman-x-latest.

    * Besides my question about the 8/16-bit switch from above, all is
      well with XeTeX, as well.

    * LuaTeX loads languages from language.dat.lua.  A proper entry is
      required there and a plain text version of the pattern files.
      Since our patterns have been there in hyph-utf8, Mojca has already
      done all the necessary work.

      The question is, whether there should be an entry for languages
      'german-x-<date>' etc.  I'd say no and I'll emphasize in our
      documentation, that package dehyph-exptl is not required for
      LuaTeX (and XeTeX).

      The only thing for us to do, is to remember to provide patterns as
      text versions, too, in future releases.  Did I miss something?


I'm sorry for the confusion about the state of German patterns.  There 
must have been some.  At least, I have learnt much about pattern loading 
during the last weeks.  Comments and corrections are welcome (hence the 
lengthy mail)!

Best regards,
Stephan Hennig


More information about the tex-hyphen mailing list