[tex-hyphen] Newbie: Question about pattern structure
Arthur Reutenauer
arthur.reutenauer at normalesup.org
Thu Aug 22 15:27:57 CEST 2019
Hello Nathalie,
Interesting subject you chose! The reference and, in my opinion, best
complete explanation of TeX’s hyphenation algorithm is appendix H of the
TeXbook (https://www.worldcat.org/oclc/826569026). You’ll find there
everything you need to get a basic understanding. For a more in-depth
analysis, see Frank Liang’s PhD thesis at http://tug.org/docs/liang/. I
thought the English Wikipedia’s article on “Hyphenation Algorithm”
offered a summary of Appendix H, but I can’t find it, so in short:
In order to hyphenate a word in a given language, you need a list of
patterns for that language. Let’s say the word is “hyphenation” and the
patterns are Knuth and Liang’s file hyphen.tex (available from CTAN:
http://mirror.ctan.org/systems/knuth/dist/lib/hyphen.tex). You start by
finding all the patterns that, ignoring the digits, match the word:
hy3ph
he2n
hena4
hen5at
1na
n2at
1tio
2io
o2n
That is to say, ‘hyph’ matches “hyphenation” (because you ignore the
3); so does “hen”, etc. Once you’ve got that list, you build a sequence
of letters and digits, and you insert them into the original word,
taking the maximum if their are several possible digits. In this case
you get:
hy3phe2n5a4tio2n
There are three places where two digits would be possible: after the
‘e’, you could insert either 2 (from “he2n”) or 1 (from “1na”), so you
take 2, the maximum of the two; before the ‘a’ you have 5 from “hen5at”
and 2 from “n2at”, so here you get 5; and after the ‘a’ you have “hena4”
that produces 4, and “1tio” that produces 1, so you take 4.
Then, you may hyphenate the word where there is an odd number,
otherwise you may not. Hence: hy-phen-ation. That’s all.
Hope this helps!
Best,
Arthur
More information about the tex-hyphen
mailing list