[texhax] TeX hyphenation -- why do so many words get no hyphens

Petr Sojka sojka at informatics.muni.cz
Thu Aug 5 21:01:34 CEST 2004


On Wed, Aug 04, 2004 at 07:06:21PM -0700, Pierre MacKay wrote:
 
> Steve Tolkin's list is rather a challenge, and suggests that even the
> patterns should be looked at again.  I do not remember what hyphenation
> list was used to start the generation of hyphen.tex, but it seems to
> have missed some real possibilities.  

It was _very_ small wordlist, 49858 words with 88420 hyphenation points marked,
that was created from Webster's Pocket dictionary with 31036 words by
adding derived forms (in Czech we have 5000000+ word forms derived 
from 200000+ word stem database).

> It ought to be pretty safe to make hypo, para, and even epi and apo
> into pre-hyphen groups tied to start-of-word.

It may, but must not be safe -- checking with huge wordlist is
the safest way.
 
> I find it interesting that I have run into so few problems of this sort
> in ten years of professional typesetting.  
In English you are lucky to have very short words that are frequent.
And global paragraph optimization in TeX is clever/forgiving --
you did not notice that you may do better. But these cases
are rare, as adding hyphen increase the paragraph badness
(depends on the weight of \hyphenpenalty). But try to push harder
by setting higher \adjdemerits, \doublehyphendemerits and
\finalhyphendemerits.
And, do you typeset multicolumn stuff?
 
> But I still think that
> we might propose improvements in hyphens.tex to Don Knuth.

Changing hyphen.tex is no-go (backward compatibility), but developing 
_much better_ new American English patterns _in addition_ to hyphen.tex 
is a way to go. You can have patterns for 256 languages simultaniously
in a format file today.

Best
--ps



More information about the texhax mailing list