[tex-hyphen] weighting hyphenation points (was: hyphenation (what else ; -))

Stephan Hennig mailing_list at arcor.de
Mon May 17 13:01:00 CEST 2010


Am 17.05.2010 00:55, schrieb Mojca Miklavec:

>>  From a readability point of view 'lava-bo' is better for me since one can
>> guess the rest of the word (whereas you can't guess the rest of la-)
>
> <not-to-be-taken-seriously>
> Oh, and yes ... I was already wondering when somebody will come up
> with the idea to extend TeX with tolerances for preferable breaking
> points in addition to the allowed ones :) :) :)
> </not-to-be-taken-seriously>

Incidentally, I've had a mail conversation about this with Taco and 
Werner a couple of weeks ago.  The good news is, I think Taco has this 
on his list.  Here's a sketch of the approach as I understand it 
(ignoring libhnj for now).

Hyphenation points can be weighted by applying multiple pattern sets in 
parallel that have different weights attached.  That is, if a match 
exists in, e.g., a compound word pattern set, then that hyphenation 
point will be weighted higher than a regular hyphenation point.  If 
concurring pattern sets find a match, the highest weight wins.

Consider these pattern sets

   * regular pattern set with an attached weight of 10:

       n1n a1d

   * compound word pattern set with an attached weight of 20:

       en1nad

and the compound word "Tannennadel" (fir needle).  The regular pattern 
set has matches

   Tan-nen-na-del

weighting each hyphenation point equally (10 or whatever).  Compound 
word patterns find the match

   Tannen-nadel

weighting that match 20.  Finally, during paragraph breaking, 
hyphenation weights will be

   Tan-nen-na-del
      10  20 10

Therefore breaking the word at the word compound Tannen-nadel will be 
(slightly) preferred.

Best regards,
Stephan Hennig


More information about the tex-hyphen mailing list