[tex-hyphen] Accuracy of the hyphenation algorithm
Philip Taylor
P.Taylor at Rhul.Ac.Uk
Wed Jul 29 10:07:52 CEST 2015
Yuri wrote:
> When I am looking at the algorithm results, I keep seeing a lot of
> inconsistencies.
>
> Original hyphen.tex has some testcases in the end, that are supposedly
> the correct hyphenation points:
No, these are not test cases; they are explicit hyphenations (i.e.,
exceptions) that correct the results that would otherwise be obtained
using only the patterns.
> But when I run the algorithm with patterns from hyphen.tex, I get these
> results:
> as·so·ci·ate
> as·so·ci·ates
> de·cli·na·tion
> obli·ga·to·ry
> phi·lan·throp·ic
> p·re·sen·t
> p·re·sents
> pro·jec·t
> pro·ject·s
> re·ciproc·i·ty
> rec·og·nizance
> re·for·ma·tion
> re·tri·bu·tion
> table
Yes, that is exactly the point. Those words are known to be hyphenated
incorrectly using the patterns alone, whence the list of exceptions.
> Available correct answers from the Merriam-Webster dictionary:
> as·so·ci·ate
> dec·li·na·tion
> oblig·a·to·ry
> phil·an·throp·ic
> pres·ent
> proj·ect
> rec·i·proc·i·ty
> re·cog·ni·zance
> ref·or·ma·tion
> ret·ri·bu·tion
> ta·ble
TeX gives these break-points for your word list :
as-so-ciate
as-so-ciates
dec-li-na-tion
oblig-a-tory
phil-an-thropic
present
presents
project
projects
reci-procity
re-cog-ni-zance
ref-or-ma-tion
ret-ri-bu-tion
ta-ble
Thus there are differences, but it is quite possible that Don Knuth did
not use Merriam-Webster as his authoritative source for hyphenation in
<Am.E>.
> Additionally, the produced "gen·uine" hyphenation split isn't correct
> (should be " gen·u·ine"), the word "toothache" isn't split at all, and
> "p·neu·mo·ni·a" result is wrong too (should be " pneu·mo·nia").
TeX
This is TeX, Version 3.14159265 (TeX Live 2014/W32TeX) (preloaded
format=tex)
**\showhyphens {genuine toothache pneumonia}
gen-uine toothache pneu-mo-nia
Thus "pneumonia" is hyphenated correctly, "genuine" arguably so
(depending on whether or not one regards the "u" as syllabic) and
"toothache" is indeed wrong.
Philip Taylor
More information about the tex-hyphen
mailing list