[tex-hyphen] hyphenation/hyph-utf8, libhyphen, lefthyphenmin and righthyphenmin

Arthur Reutenauer arthur.reutenauer at normalesup.org
Sat May 28 00:54:33 CEST 2016


	Hi Eric,

> - the patterns are built for certain values of lefthyphenmin and
> righthyphenmin, and in particular do not function properly with smaller
> values than they have been built for. E.g. the pattern "1b2l", with
> leftminhyphen = 1,  would hyphenate the word xbla as x=bla,  whereas it
> would not with lefthyphenmin = 2

  That’s correct when the patterns have been generated by patgen.  When
they’ve been written by hand, or are the result of some systematic
rules, there is usually no minimum value in the sense that lower
settings would generate incorrect results (there are however
recommendations for best typographic results).  The generation values
are unevenly documented and we’ve done our best effort to collect that
information, but in some cases it’s just guesswork.

> - I should get the patterns from
> hyph-utf8/tex/generic/hyph-utf8/patterns/txt/hyph-<lang>.pat.txt

  Yes, these are the plain text patterns.

> - those files do not mention lefthyphenmin nor righthyphenmin

  Indeed, the point was to be able to feed the patterns directly to
functions that expect lists of patterns (for example lang.patterns in
LuaTeX).

> - it seems that the authoritative source for left/righthyphenmin is in
> TL/tlpkg/tlpsrc/hyphen-<lang2>.tlpsrc

  No, the ultimate source for that information -- to the extent that we
could find it -- is hyph-utf8/source/generic/hyph-utf8/languages.rb (see
class Languages starting at line 250).  We’re in the process of moving
it to a machine-readable form in the TeX files that are the sources of
all pattern files (the plain text files hyph-<lang>.pat.txt are
generated from that).

> - those files used different convention for the <lang> part, e.g. "fr" for
> the patterns but "french" for the tlpsrc.

  The files hyphen-*.tlpsrc describe the packages for TeX Live and are
just convenience names that do not necessarily correspond to individual
languages (see hyphen-indic.tlpsrc, for example).

  For individual language files we follow BCP 47, we’re been very strict
about that.

> - libhyphen expects LEFTHYPHENMIN and RIGHTHYPHENMIN to be specified at the
> beginning of the pattern file it takes as input, and default to 2, 2

  I think the first part is right, but I don’t know what the default is.

> - at least the irish patterns use greater values (righthyphenmin = 3)

  So does English.

> In other words, I need to present to libhyphen an Irish pattern file
> starting with:
> 
> UTF-8
> LEFTHYPHENMIN 2
> RIGHTHYPHENMIN 3

  That seems correct, but this list is not the best place to ask about
libhyphen.  I have seen additional lines in files used by LibreOffice,
relating to compound words, for example.

	Best,

		Arthur


More information about the tex-hyphen mailing list