[tex-hyphen] Accuracy of the hyphenation algorithm
Yuri
yuri at rawbw.com
Wed Jul 29 01:36:38 CEST 2015
When I am looking at the algorithm results, I keep seeing a lot of
inconsistencies.
Original hyphen.tex has some testcases in the end, that are supposedly
the correct hyphenation points:
as-so-ciate
as-so-ciates
dec-li-na-tion
oblig-a-tory
phil-an-thropic
present
presents
project
projects
reci-procity
re-cog-ni-zance
ref-or-ma-tion
ret-ri-bu-tion
ta-ble
But when I run the algorithm with patterns from hyphen.tex, I get these
results:
as·so·ci·ate
as·so·ci·ates
de·cli·na·tion
obli·ga·to·ry
phi·lan·throp·ic
p·re·sen·t
p·re·sents
pro·jec·t
pro·ject·s
re·ciproc·i·ty
rec·og·nizance
re·for·ma·tion
re·tri·bu·tion
table
Available correct answers from the Merriam-Webster dictionary:
as·so·ci·ate
dec·li·na·tion
oblig·a·to·ry
phil·an·throp·ic
pres·ent
proj·ect
rec·i·proc·i·ty
re·cog·ni·zance
ref·or·ma·tion
ret·ri·bu·tion
ta·ble
Additionally, the produced "gen·uine" hyphenation split isn't correct
(should be " gen·u·ine"), the word "toothache" isn't split at all, and
"p·neu·mo·ni·a" result is wrong too (should be " pneu·mo·nia").
I tried Hyphenator.js JavaScript implementation
(https://github.com/mnater/hyphenator) with pattern set from hyphen.tex,
reviewed the algorithm there in detail, and it seems correct. I didn't
try the Tex implementation.
Franklin Liang paper says that this algorithm almost always produces
correct results.
So how to explain these discrepancies? Why even the testcases from
hyphen.tex aren't reproducible? Is the algorithm implementation not
correct? Something is missing?
Yuri
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://tug.org/pipermail/tex-hyphen/attachments/20150728/8cd43c3d/attachment.html>
More information about the tex-hyphen
mailing list