[tex-hyphen] US English patterns in hyph-en-us.pat.txt are buggy

Roozbeh Pournader roozbeh at google.com
Fri Jun 9 22:45:19 CEST 2017


Thanks. Just reported to https://github.com/hyphenation/tex-hyphen/issues/15
.

On Fri, Jun 9, 2017 at 1:41 PM, Arthur Reutenauer <
arthur.reutenauer at normalesup.org> wrote:

>         Hi Roozbeh,
>
> > First post to the list, reporting a bug. Please point me to the bug
> tracker
> > if there is one.
>
>   The hyphenation patterns are now hosted on GitHub, and you can open an
> issue there (https://github.com/hyphenation/tex-hyphen), but I’m happy
> to reply here:
>
> > Debugging an Android user report, I found that Android was hyphenating
> the
> > words "democrat" and "democrats" incorrectly, as:
> >
> > de-mo-c-rat
> > de-moc-rats
>
>   Thank for the bug report.  That does look bad :-)
>
> > Digging deeper, the source of the problem seems to be the following
> pattern
> > in hyph-en-us.pat.txt:
> >
> > 5moc1ra1t
> >
> > That pattern seems to not exist in Plain TeX's pattern file for US
> English.
> > The other patterns applying to those words, all existing in Plain TeX,
> are:
> >
> > 1mo
> > 4mocr
> > 5crat.
> >
> > I think the source of the problem is that the authors of the extended
> > pattern file derived the modified patterns based on TUGboat's exception
> > list, they created that "5moc1ra1t" pattern based on the word
> > "de-moc-ra-tism" and didn't notice that adding it would cause "democrat"
> > and "democrats" to be hyphenated incorrectly.
>
>   I agree with that analysis, that’s indeed a common problem when adding
> patterns to a list generated by patgen, as the en-US patterns have.
>
> > I believe as a temporary solution, the US English hyphenation patterns
> > should be restored to the Plain TeX version, and the exception list
> should
> > get extended to include everything in
> > http://mirror.ctan.org/info/digests/tugboat/hyphenex/ushyphex.tex.
>
>   I don’t think we’d like to do that, but we’ll think of a solution, and
> thank you again for reporting the issue.
>
>         Best,
>
>                 Arthur
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://tug.org/pipermail/tex-hyphen/attachments/20170609/ad968a52/attachment.html>


More information about the tex-hyphen mailing list