[texhax] TeX hyphenation -- why do so many words get no hyphens

Tolkin, Steve Steve.Tolkin at FMR.COM
Wed Aug 4 22:13:57 CEST 2004


Summary:
It seems that the hyphenation approach used by TeX fails to insert
any hyphens in many words.  Or am I doing something wrong?

Details:
I am not a user of TeX but simply using its code that hyphenate words.
I think I am using the standard set of rules in the ushyph1.tex file.
(I used these via the Perl module TeX::Hyphen .)
I ran this on a large word list (the YAWL list
http://metalab.unc.edu/pub/Linux/libs/yawl-0.3.tar.gz
reachable from e.g.
http://personal.riverusers.com/~thegrendel/software.html ).
I discovered that for many words, even long words with many syllables,
it does not insert any hyphens (actually, possible hyphenation points).

Here are some examples or words that have no hyphenation points:

whether achieved promised spokesman widespread database briefly princess
medicine surgery singing stronger sergeant lawyers guardian marginal
wildlife
refugees painful overnight essence tragedy monopoly molecules evenings
sovereign timetable lifestyle strongest marathon salaries manuscript
lunchtime providers archives aerospace headache bankrupt lightweight
monastery
paramount galaxy wheelchair exquisite paradigm volatile postscript
hypocrisy
metaphors takeovers labyrinth stronghold monotonous grievance
wheelchairs
foothold fledgling prostaglandin trichloroethylenes

This is a actually a small subset. There are more than 20000 words
that I think should have at least one hyphenation point, but which do
not have any.

Is it really the case that TeX does not hyphenate these words?
Or am I doing something wrong.

I think the algorithm needs another step -- if the word is long enough
try harder to hyphenate.  
Possibly this could be done by adding more patterns, at a lower level
(i.e. smaller integer). 

P.S. I am sending essentially the same message a second time
because the first one was not put into the mailing list archive.
At http://tug.org/pipermail/texhax/2004-August/002484.html
it simply says: Skipped content of type multipart/alternative
and has no other content at all.  So I am sending this message
from Outlook with format "Plain Text".  Ideally the
mailing list archives would handle this better -- I believe
that one of the formats in the original message is plain text.

Thanks,
Steve
-- 
Steve Tolkin    Steve . Tolkin at FMR dot COM   617-563-0516 
Fidelity Investments   82 Devonshire St. V4D     Boston MA 02109
There is nothing so practical as a good theory.  Comments are by me, 
not Fidelity Investments, its subsidiaries or affiliates.



More information about the texhax mailing list