[tex-hyphen] Accuracy of the hyphenation algorithm

Joseph Wright joseph.wright at morningstar2.co.uk
Wed Jul 29 07:59:04 CEST 2015


On 29/07/2015 00:36, Yuri wrote:
> When I am looking at the algorithm results, I keep seeing a lot of
> inconsistencies.
> 
> Original hyphen.tex has some testcases in the end, that are supposedly
> the correct hyphenation points:
> as-so-ciate
> as-so-ciates
> dec-li-na-tion
> oblig-a-tory
> phil-an-thropic
> present
> presents
> project
> projects
> reci-procity
> re-cog-ni-zance
> ref-or-ma-tion
> ret-ri-bu-tion
> ta-ble
> 
> But when I run the algorithm with patterns from hyphen.tex, I get these
> results:
> as·so·ci·ate
> as·so·ci·ates
> de·cli·na·tion
> obli·ga·to·ry
> phi·lan·throp·ic
> p·re·sen·t
> p·re·sents
> pro·jec·t
> pro·ject·s
> re·ciproc·i·ty
> rec·og·nizance
> re·for·ma·tion
> re·tri·bu·tion
> table

What do you mean 'run the algorithm'? A plain TeX document

\showhyphens{
associate
associates
declination
obligatory
philanthropic
present
presents
project
projects
reciprocity
recognizance
reformation
retribution
table}
\bye

gives

    []  \tenrm as-so-ciate as-so-ciates dec-li-na-tion oblig-a-tory
    phil-an-thropic present presents project projects reci-procity
    re-cog-ni-zance ref-or-ma-tion ret-ri-bu-tion ta-ble

exactly as expected. (These are not 'test cases, but rather corrections
for words that the pattern-based approach fails with.)

> Available correct answers from the Merriam-Webster dictionary:
> as·so·ci·ate
> dec·li·na·tion
> oblig·a·to·ry
> phil·an·throp·ic
> pres·ent
> proj·ect
> rec·i·proc·i·ty
> re·cog·ni·zance
> ref·or·ma·tion
> ret·ri·bu·tion
> ta·ble

Well obviously DEK doesn't agree :-) The file hyphen.tex is defined by
Knuth and is not to be changed (as it would affect line breaking in
existing documents).

> Additionally, the produced "gen·uine" hyphenation split isn't correct
> (should be " gen·u·ine"), the word "toothache" isn't split at all, and
> "p·neu·mo·ni·a" result is wrong too (should be " pneu·mo·nia").
> 
> I tried Hyphenator.js JavaScript implementation
> (https://github.com/mnater/hyphenator) with pattern set from hyphen.tex,
> reviewed the algorithm there in detail, and it seems correct. I didn't
> try the Tex implementation.
> 
> Franklin Liang paper says that this algorithm almost always produces
> correct results.

That's true: 'almost' always. The entire need for exceptions is that not
every case is hyphenated corrected purely using patterns.
--
Joseph Wright




More information about the tex-hyphen mailing list