[tex-hyphen] Accuracy of the hyphenation algorithm
joseph.wright at morningstar2.co.uk
Wed Jul 29 07:59:04 CEST 2015
On 29/07/2015 00:36, Yuri wrote:
> When I am looking at the algorithm results, I keep seeing a lot of
> Original hyphen.tex has some testcases in the end, that are supposedly
> the correct hyphenation points:
> But when I run the algorithm with patterns from hyphen.tex, I get these
What do you mean 'run the algorithm'? A plain TeX document
 \tenrm as-so-ciate as-so-ciates dec-li-na-tion oblig-a-tory
phil-an-thropic present presents project projects reci-procity
re-cog-ni-zance ref-or-ma-tion ret-ri-bu-tion ta-ble
exactly as expected. (These are not 'test cases, but rather corrections
for words that the pattern-based approach fails with.)
> Available correct answers from the Merriam-Webster dictionary:
Well obviously DEK doesn't agree :-) The file hyphen.tex is defined by
Knuth and is not to be changed (as it would affect line breaking in
> Additionally, the produced "gen·uine" hyphenation split isn't correct
> (should be " gen·u·ine"), the word "toothache" isn't split at all, and
> "p·neu·mo·ni·a" result is wrong too (should be " pneu·mo·nia").
> (https://github.com/mnater/hyphenator) with pattern set from hyphen.tex,
> reviewed the algorithm there in detail, and it seems correct. I didn't
> try the Tex implementation.
> Franklin Liang paper says that this algorithm almost always produces
> correct results.
That's true: 'almost' always. The entire need for exceptions is that not
every case is hyphenated corrected purely using patterns.
More information about the tex-hyphen