[tex-hyphen] US English patterns in hyph-en-us.pat.txt are buggy

Roozbeh Pournader roozbeh at google.com
Fri Jun 9 22:16:54 CEST 2017


Hi,

First post to the list, reporting a bug. Please point me to the bug tracker
if there is one.

Debugging an Android user report, I found that Android was hyphenating the
words "democrat" and "democrats" incorrectly, as:

de-mo-c-rat
de-moc-rats

While Merriam Webster was recommending:

dem-o-crat

And Plain TeX was hyphenating as:

demo-crat
democrats

Digging deeper, the source of the problem seems to be the following pattern
in hyph-en-us.pat.txt:

5moc1ra1t

That pattern seems to not exist in Plain TeX's pattern file for US English.
The other patterns applying to those words, all existing in Plain TeX, are:

1mo
4mocr
5crat.

I think the source of the problem is that the authors of the extended
pattern file derived the modified patterns based on TUGboat's exception
list, they created that "5moc1ra1t" pattern based on the word
"de-moc-ra-tism" and didn't notice that adding it would cause "democrat"
and "democrats" to be hyphenated incorrectly.

I guess these two words would not be the only exceptions, and there should
be tens of other words that are affected by a similar problem of
over-weighing the exception list.

I believe as a temporary solution, the US English hyphenation patterns
should be restored to the Plain TeX version, and the exception list should
get extended to include everything in
http://mirror.ctan.org/info/digests/tugboat/hyphenex/ushyphex.tex.

I can prepare a patch, if that's useful.

Best,
Roozbeh
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://tug.org/pipermail/tex-hyphen/attachments/20170609/b0b16b7e/attachment.html>


More information about the tex-hyphen mailing list