[tex-hyphen] US English patterns in hyph-en-us.pat.txt are buggy
Arthur Reutenauer
arthur.reutenauer at normalesup.org
Fri Jun 9 22:41:56 CEST 2017
Hi Roozbeh,
> First post to the list, reporting a bug. Please point me to the bug tracker
> if there is one.
The hyphenation patterns are now hosted on GitHub, and you can open an
issue there (https://github.com/hyphenation/tex-hyphen), but I’m happy
to reply here:
> Debugging an Android user report, I found that Android was hyphenating the
> words "democrat" and "democrats" incorrectly, as:
>
> de-mo-c-rat
> de-moc-rats
Thank for the bug report. That does look bad :-)
> Digging deeper, the source of the problem seems to be the following pattern
> in hyph-en-us.pat.txt:
>
> 5moc1ra1t
>
> That pattern seems to not exist in Plain TeX's pattern file for US English.
> The other patterns applying to those words, all existing in Plain TeX, are:
>
> 1mo
> 4mocr
> 5crat.
>
> I think the source of the problem is that the authors of the extended
> pattern file derived the modified patterns based on TUGboat's exception
> list, they created that "5moc1ra1t" pattern based on the word
> "de-moc-ra-tism" and didn't notice that adding it would cause "democrat"
> and "democrats" to be hyphenated incorrectly.
I agree with that analysis, that’s indeed a common problem when adding
patterns to a list generated by patgen, as the en-US patterns have.
> I believe as a temporary solution, the US English hyphenation patterns
> should be restored to the Plain TeX version, and the exception list should
> get extended to include everything in
> http://mirror.ctan.org/info/digests/tugboat/hyphenex/ushyphex.tex.
I don’t think we’d like to do that, but we’ll think of a solution, and
thank you again for reporting the issue.
Best,
Arthur
More information about the tex-hyphen
mailing list