[tex-hyphen] hyphen.tex and hyph-en-us.pat.tex

Stephan Hennig mailing_list at arcor.de
Tue Jan 28 12:00:20 CET 2014


is there a file containing a pure text version of the patterns in file

And how do I activate patterns from file hyph-en-us.pat.txt when using
LuaLaTeX?  Neither babel nor polyglossia seems to be using those
patterns.  With the attached LaTeX file, I get the following
hyphenations of the word 'catastrophe':

   Polyglossia [variant=usmax]     catastrophe
   Babel       [USenglish]         catas-trophe

Attached is a small script which greps file hyph-en-us.pat.txt for
candidate patterns of the word 'catastrophe' dealing with the position
between letters t and a.  There are 16 of them and inspection shows that
pattern 'cat1a1s2' is the only real match.  According to that pattern,
cat-astrophe should be a valid hyphenation.  Why doesn't LaTeX show that?

Well, the shell script is a complicated way of inspecting patterns.  In
the padrinoma repository[1], you can find a Lua script that decomposes
the words from standard input into patterns from a given file:

> $ echo catastrophe | texlua debug_spots.lua -p hyph-en-us.pat.txt
> boundary letter: '.'
> spot mins: 2 2
> pattern file: d:/texlive/2013/texmf-dist/tex/generic/hyph-utf8/patterns/txt/hyph
> -en-us.pat.txt
> 4938 patterns read.
>  . c a t a s t r o p h e .
>   1c a
>  . c a4t
>       1t a
>      a2t a
>    c a t1a1s2
>            s t4r
>          a s1t r
>                 2o p h
>              t r o5p h e
>  .0c0a4t1a1s2t4r2o5p0h0e0.
> cat-a-stro-phe

Which also shows the hyphenation between letters t and a.  Because of
the manual checking, I have hopes that the Lua code isn't wrong.

Best regards,
Stephan Hennig

[1] <URL:https://github.com/sh2d/padrinoma>, see directory
examples/debug_spots/ and call 'texlua debug_spots.lua' for help
-------------- next part --------------
#-- -*- coding: utf-8 -*-
grep "^\.*[0-9]*c*[0-9]*a*[0-9]*t*[0-9]*a*[0-9]*s*[0-9]*t*[0-9]*r*[0-9]*o*[0-9]*p*[0-9]*h*[0-9]*e*[0-9]*\.*$" `kpsewhich hyph-en-us.pat.txt` \
| grep -E "t[0-9]|[0-9]a"
-------------- next part --------------

More information about the tex-hyphen mailing list