[tex-hyphen] Newbie: Question about pattern structure

Arthur Reutenauer arthur.reutenauer at normalesup.org
Thu Aug 22 15:27:57 CEST 2019


	Hello Nathalie,

  Interesting subject you chose! The reference and, in my opinion, best
complete explanation of TeX’s hyphenation algorithm is appendix H of the
TeXbook (https://www.worldcat.org/oclc/826569026).  You’ll find there
everything you need to get a basic understanding.  For a more in-depth
analysis, see Frank Liang’s PhD thesis at http://tug.org/docs/liang/.  I
thought the English Wikipedia’s article on “Hyphenation Algorithm”
offered a summary of Appendix H, but I can’t find it, so in short:

  In order to hyphenate a word in a given language, you need a list of
patterns for that language.  Let’s say the word is “hyphenation” and the
patterns are Knuth and Liang’s file hyphen.tex (available from CTAN:
http://mirror.ctan.org/systems/knuth/dist/lib/hyphen.tex).  You start by
finding all the patterns that, ignoring the digits, match the word:

	hy3ph
	he2n
	hena4
	hen5at
	1na
	n2at
	1tio
	2io
	o2n

  That is to say, ‘hyph’ matches “hyphenation” (because you ignore the
3); so does “hen”, etc.  Once you’ve got that list, you build a sequence
of letters and digits, and you insert them into the original word,
taking the maximum if their are several possible digits.  In this case
you get:

	hy3phe2n5a4tio2n

  There are three places where two digits would be possible: after the
‘e’, you could insert either 2 (from “he2n”) or 1 (from “1na”), so you
take 2, the maximum of the two; before the ‘a’ you have 5 from “hen5at”
and 2 from “n2at”, so here you get 5; and after the ‘a’ you have “hena4”
that produces 4, and “1tio” that produces 1, so you take 4.

  Then, you may hyphenate the word where there is an odd number,
otherwise you may not.  Hence: hy-phen-ation.  That’s all.

  Hope this helps!

	Best,

		Arthur


More information about the tex-hyphen mailing list