[tex-hyphen] Accuracy of the hyphenation algorithm

Yuri yuri at rawbw.com
Wed Jul 29 01:36:38 CEST 2015


When I am looking at the algorithm results, I keep seeing a lot of 
inconsistencies.

Original hyphen.tex has some testcases in the end, that are supposedly 
the correct hyphenation points:
as-so-ciate
as-so-ciates
dec-li-na-tion
oblig-a-tory
phil-an-thropic
present
presents
project
projects
reci-procity
re-cog-ni-zance
ref-or-ma-tion
ret-ri-bu-tion
ta-ble

But when I run the algorithm with patterns from hyphen.tex, I get these 
results:
as·so·ci·ate
as·so·ci·ates
de·cli·na·tion
obli·ga·to·ry
phi·lan·throp·ic
p·re·sen·t
p·re·sents
pro·jec·t
pro·ject·s
re·ciproc·i·ty
rec·og·nizance
re·for·ma·tion
re·tri·bu·tion
table

Available correct answers from the Merriam-Webster dictionary:
as·so·ci·ate
dec·li·na·tion
oblig·a·to·ry
phil·an·throp·ic
pres·ent
proj·ect
rec·i·proc·i·ty
re·cog·ni·zance
ref·or·ma·tion
ret·ri·bu·tion
ta·ble

Additionally, the produced "gen·uine" hyphenation split isn't correct 
(should be " gen·u·ine"), the word "toothache" isn't split at all, and 
"p·neu·mo·ni·a" result is wrong too (should be " pneu·mo·nia").

I tried Hyphenator.js JavaScript implementation 
(https://github.com/mnater/hyphenator) with pattern set from hyphen.tex, 
reviewed the algorithm there in detail, and it seems correct. I didn't 
try the Tex implementation.

Franklin Liang paper says that this algorithm almost always produces 
correct results.

So how to explain these discrepancies? Why even the testcases from 
hyphen.tex aren't reproducible? Is the algorithm implementation not 
correct? Something is missing?


Yuri
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://tug.org/pipermail/tex-hyphen/attachments/20150728/8cd43c3d/attachment.html>


More information about the tex-hyphen mailing list