<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<div class="moz-cite-prefix">Arthur Reutenauer wrote:<br>
<br>
</div>
<blockquote type="cite" cite="mid:20190822132757.tas72hii2dt4l7lc@phare.normalesup.org">
<pre wrap="">In order to hyphenate a word in a given language, you need a list of
patterns for that language. Let’s say the word is “hyphenation” and the
patterns are Knuth and Liang’s file hyphen.tex (available from CTAN:
<a class="moz-txt-link-freetext" href="http://mirror.ctan.org/systems/knuth/dist/lib/hyphen.tex">http://mirror.ctan.org/systems/knuth/dist/lib/hyphen.tex</a>).
</pre>
</blockquote>
<br>
I think that what Arthur has written is very helpful, but it will surely leave the intelligent reader asking "but how were those patterns generated, and what do the numbers mean". The introduction to Patgen.web sheds some light on this :<br>
<br>
<blockquote type="cite">Introduction. This program takes a list of hyphenated words and generates a set of patterns that<br>
can be used by the TEX82 hyphenation algorithm.<br>
<br>
The patterns consist of strings of letters and digits, where a digit indicates a 'hyphenation value' for some<br>
intercharacter position. For example, the pattern "3t2ion" speci es that if the string "tion" occurs in a word,<br>
we should assign a hyphenation value of 3 to the position immediately before the "t", and a value of 2 to the<br>
position between the "t" and the "i".<br>
<br>
The patterns are generated in a series of sequential passes through the dictionary. In each pass, we<br>
collect count statistics for a particular type of pattern, taking into account the e ffect of patterns chosen in<br>
previous passes. At the end of a pass, the counts are examined and new patterns are selected.<br>
Patterns are chosen one level at a time, in order of increasing hyphenation value. In the sample run<br>
shown below, the parameters "hyph start" and "hyph finish" specify the fi rst and last levels respectively to be<br>
generated.<br>
<br>
Patterns at each level are chosen in order of increasing pattern length (usually starting with length 2).<br>
This is controlled by the parameters "pat start" and "pat fi nish" speci ed at the beginning of each level.<br>
Furthermore patterns of the same length applying to di fferent intercharacter positions are chosen in<br>
separate passes through the dictionary. Since patterns of length n may apply to n + 1 diff erent positions,<br>
choosing a set of patterns of lengths 2 through n for a given level requires (n+1)(n+2)=2 \ge 3 passes through<br>
the word list.<br>
<br>
At each level, the selection of patterns is controlled by the three parameters "good wt" , "bad wt" and "thresh".<br>
A hyphenating pattern will be selected if good * good wt – bad * bad wt \ge thresh , where "good" and "bad" are<br>
the number of times the pattern could and could not be hyphenated respectively at a particular point.<br>
For inhibiting patterns, "good" is the number of errors inhibited, and "bad" is the number of previously found<br>
hyphens inhibited.<br>
</blockquote>
<br>
The interested reader is referred to (e.g.,) <a class="moz-txt-link-freetext" href="http://readytext.co.uk/files/patgen.pdf">
http://readytext.co.uk/files/patgen.pdf</a><br>
<i>Philip Taylor</i><i><br>
</i><br>
</body>
</html>