[tex-live] [tex-hyphen] german hyphenation patterns

Mojca Miklavec mojca.miklavec.lists at gmail.com
Tue Jun 10 03:37:54 CEST 2008


Hello the German team,

On Mon, Jun 9, 2008 at 9:17 PM, Stephan Hennig wrote:
>
> An update from the German hyphenation pattern team: experimental German
> hyphenation patterns are provided as two packages.
>
> 1. Heiko Oberdiek has kindly contributed a new package hyphsubst (part
> of the oberdiek bundle), that hooks into TeX's language handling and
> provides means to replace hyphenation patterns bound to a language.
> This package should be on its way to CTAN, currently.

Good to hear :)

> 2. Experimental patterns are packaged as a simple zip archive containing
> pattern files for traditional and reformed orthography together with a
> short manual and a README.  These files are nearly completed, two items
> remain to be done:
>
>  a. I've recently found two small bugs in the patterns that I'd
>     like to see fixed.  This should be a matter of hours.

OK.

> If item b is not fixed by, say Wednesday, we'll upload the patterns to
> CTAN anyway (as package german-x).  For TeX Live 2008 we'd like to see
> the following thing to happen magically:
>
> 1. Update the oberdiek bundle to contain the new package hyphsubst.
>
> 2. Add package german-x containing experimental patterns to TeX Live
> 2008 and add them to language.dat.  (Roughly the following lines:
>
>  german-x-<date> dehypht-x-<date>.tex
>  =german-x-latest
>  ngerman-x-<date> dehyphn-x-<date>.tex
>  =ngerman-x-latest
>
> where the exact value of <date> will be announced later.)

That's all fine, with one big "but", that but being: the same
language.dat is being read by BOTH pdftex and xetex. So if you want to
put dehyphn-x-<date>.tex into language.dat, the patterns REALLY need
to be XeTeX-compatible, so you need to write a XeTeX wrapper
xu-dehyphn-x-<date>.tex first if you want to do that.

And no - you really do not want to do that. Here you can see examples
of what you might end up with:

  \catcode`\?=7
  % Define the accent macro " in such a way that it
  % expands to single letters in Unicode
  \catcode`\"=13
  \def"#1{\ifx#1a??e4\else \ifx#1o??f6\else \ifx#1u??fc\else
      \errmessage{Hyphenation pattern file corrupted!}%
    \fi\fi\fi}
  %   - patterns with umlauts are ok
  \def\n#1{#1}
  %   - define \3 to be character "00DF (\ss in Unicode)
  \def\3{??df}
  %   - define \9 to throw an error
  \def\9{\errmessage{Hyphenation pattern file corrupted!}}
  %   - duplicated patterns to support font encoding OT1 are not wanted
  \def\c#1{}
  %
  \let\PATTERNS=\patterns
  \def\patterns{% at the \patterns command in dehypht.tex...
    \endgroup % end group containing local definitions from dehypht
    \begingroup % and start our own (to match \endgroup in dehypht)
    %
    \PATTERNS % and then load the real patterns
  }

> 3. Do UTF-8 conversion of the experimental patterns for whatever your
> reasons are.  (I'm not yet acquainted with XeTeX and LuaTeX. :/ )
>
> Is that OK for you?

It sounds fine to me to use your patterns by default when XeTeX loads
"normal" german patterns.

If you're providing additional packages for pdf(La)TeX, it's probably
helpful for the user, and that doesn't concern the process of pattern
conversion to utf-8. If someone writes a packages, go forward. XeTeX
may load additional language that it's not really going to use.

(XeTeX also loads Greek patterns in ibycus encoding. They're of no use
of course, but they're at least harmless.)

-------

A summary: please, please, please ... do take a look at

1.) svn://tug.org/texhyphen/trunk/tex/loadhyph/loadhyph-xx.tex (german
is not the best example, take a look at "sl" instead; also the file
will probably change - it's autogenerated anyway)
and
2.) svn://tug.org/texhyphen/trunk/tex/patterns/utf8/lang-de-1996.tex
for an example of what we would really like to see: plain patterns,
utf-8 encoded, no catcodes, no lccodes, no TeX macros

With two languages that goal was unavoidable, and with languages such
as german & french, I do not dare changing anything as it would break
compatibility with OT1-encoded fonts.

The overall scheme, locations, comments ... all that might change, but
the idea is to split content (patterns themselves) from intepretation
(adapting them for xetex or pdftex that uses T1 or T2A encoding,
depending on needs).

And for the sake of that, it would be really helpful to submit "clean"
files. Definitely I can write a script to convert your patterns into
the proper format, but I guess that you would want updates to happen
automatically whenever you update your patterns, not waiting for
someone else to write you a new wrapper script and submit it to the
proper location.

Mojca


More information about the tex-live mailing list