[tex-hyphen] extending `hyph-zh-latn-pinyin.tex'

Mojca Miklavec mojca.miklavec.lists at gmail.com
Wed Nov 21 21:57:40 CET 2018

Dear Werner,

On Wed, 21 Nov 2018 at 20:32, Werner LEMBERG wrote:
> Folks,
> I think it would be nice to extend `hyph-zh-latn-pinyin.tex' by
> covering Chinese syllables with tone marks – it's trivial to extend,
> say, pattern
>   a1b
> with
>   ā1b
>   á1b
>   ǎ1b
>   à1b
> and ditto for all other patterns.  However, such an extended file
> would only be usable by XeTeX and luatex.  I now wonder what route
> should be taken to stay compatible with pdfTeX and classical TeX.
>   (1) Another file.
>       This solution I rather dislike.

This is what I would go for.

But I would create a simple script in any programming language (lua,
ruby, python, ...) and generate two pattern files out of it.

You can see the folder source/generic/hyph-utf8/languages with some examples.

There are some cases like all the languages with quotation marks in
patterns which do the following (provide just additional patterns in a
separate file):

    % Unicode-aware engine (such as XeTeX or LuaTeX) only sees a
single (2-byte) argument
    \message{UTF-8 Italian hyphenation patterns}
    \input hyph-it.tex
    \input hyph-quote-it.tex
    % 8-bit engine (such as TeX or pdfTeX)
    \message{ASCII Italian hyphenation patterns}
    \input hyph-it.tex

but that's a super small subset of patterns. In case of additional
patterns representing the vast majority I really see no advantage of
providing just partial file with additional patterns.

>   (2) Some conditional code within \patterns to append the extended
>       patterns.
>       This would be my choice.  However, it clutters the patterns with
>       some TeX commands.

No, please don't use that. We parse those files (outside of TeX) to
generate plain text patterns and TeX would only be causing us troubles

>   (3) Two \pattern blocks (one for XeTeX/luatex, another one for
>       pdftex/classical TeX) enclosed by conditionals.
>       A variant of (2) which might be preferable.

While slightly cleaner, this is hardly better than (2). We also ship
separate pattern files for monotonic and polytonic Greek. Or two sets
of patterns for Serbian (in different scripts). While none of the
situations is the same as for pinyin, I don't see any issue with
multiple files. The only disadvantage of two files is that one might
change one, but not the other. However this is when using a script to
generate both comes handy.

> What TeX macros or primitives should be used?

None :)
Android cannot parse TeX primitives :)


More information about the tex-hyphen mailing list