[tex-hyphen] Accuracy of the hyphenation algorithm

Martijn van der Lee martijn at crmmailtech.nl
Wed Jul 29 10:21:11 CEST 2015


I'm guessing the bad "p-neu-mo-ni-a" may be caused by missing support for
LEFTHYPHENMIN and RIGHTHYPHENMIN in the implementation used.
>From the top of my head, these are both atleast 2 for english.

kind regards,
Martijn van der Lee (developer of the phpSyllable implementation for PHP at
https://github.com/vanderlee/phpSyllable).

2015-07-29 10:07 GMT+02:00 Philip Taylor <P.Taylor at rhul.ac.uk>:

>
>
> Yuri wrote:
>
> > When I am looking at the algorithm results, I keep seeing a lot of
> > inconsistencies.
> >
> > Original hyphen.tex has some testcases in the end, that are supposedly
> > the correct hyphenation points:
>
> No, these are not test cases; they are explicit hyphenations (i.e.,
> exceptions) that correct the results that would otherwise be obtained
> using only the patterns.
>
> > But when I run the algorithm with patterns from hyphen.tex, I get these
> > results:
> > as·so·ci·ate
> > as·so·ci·ates
> > de·cli·na·tion
> > obli·ga·to·ry
> > phi·lan·throp·ic
> > p·re·sen·t
> > p·re·sents
> > pro·jec·t
> > pro·ject·s
> > re·ciproc·i·ty
> > rec·og·nizance
> > re·for·ma·tion
> > re·tri·bu·tion
> > table
>
> Yes, that is exactly the point.  Those words are known to be hyphenated
> incorrectly using the patterns alone, whence the list of exceptions.
>
> > Available correct answers from the Merriam-Webster dictionary:
> > as·so·ci·ate
> > dec·li·na·tion
> > oblig·a·to·ry
> > phil·an·throp·ic
> > pres·ent
> > proj·ect
> > rec·i·proc·i·ty
> > re·cog·ni·zance
> > ref·or·ma·tion
> > ret·ri·bu·tion
> > ta·ble
>
> TeX gives these break-points for your word list :
>
> as-so-ciate
> as-so-ciates
> dec-li-na-tion
> oblig-a-tory
> phil-an-thropic
> present
> presents
> project
> projects
> reci-procity
> re-cog-ni-zance
> ref-or-ma-tion
> ret-ri-bu-tion
> ta-ble
>
> Thus there are differences, but it is quite possible that Don Knuth did
> not use Merriam-Webster as his authoritative source for hyphenation in
> <Am.E>.
>
> > Additionally, the produced "gen·uine" hyphenation split isn't correct
> > (should be " gen·u·ine"), the word "toothache" isn't split at all, and
> > "p·neu·mo·ni·a" result is wrong too (should be " pneu·mo·nia").
>
> TeX
> This is TeX, Version 3.14159265 (TeX Live 2014/W32TeX) (preloaded
> format=tex)
> **\showhyphens {genuine toothache pneumonia}
>
> gen-uine toothache pneu-mo-nia
>
> Thus "pneumonia" is hyphenated correctly, "genuine" arguably so
> (depending on whether or not one regards the "u" as syllabic) and
> "toothache" is indeed wrong.
>
> Philip Taylor
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://tug.org/pipermail/tex-hyphen/attachments/20150729/cbe87fe0/attachment.html>


More information about the tex-hyphen mailing list