[tex-hyphen] weighting hyphenation points (was: hyphenation (what else ; -))
Mojca Miklavec
mojca.miklavec.lists at gmail.com
Tue May 18 21:12:26 CEST 2010
On Mon, May 17, 2010 at 13:01, Stephan Hennig wrote:
> Am 17.05.2010 00:55, schrieb Mojca Miklavec:
>
>>> From a readability point of view 'lava-bo' is better for me since one
>>> can
>>> guess the rest of the word (whereas you can't guess the rest of la-)
>>
>> <not-to-be-taken-seriously>
>> Oh, and yes ... I was already wondering when somebody will come up
>> with the idea to extend TeX with tolerances for preferable breaking
>> points in addition to the allowed ones :) :) :)
>> </not-to-be-taken-seriously>
>
> Incidentally, I've had a mail conversation about this with Taco and Werner a
> couple of weeks ago. The good news is, I think Taco has this on his list.
> Here's a sketch of the approach as I understand it (ignoring libhnj for
> now).
>
> Hyphenation points can be weighted by applying multiple pattern sets in
> parallel that have different weights attached. That is, if a match exists
> in, e.g., a compound word pattern set, then that hyphenation point will be
> weighted higher than a regular hyphenation point. If concurring pattern
> sets find a match, the highest weight wins.
>
> Consider these pattern sets
>
> * regular pattern set with an attached weight of 10:
>
> n1n a1d
>
> * compound word pattern set with an attached weight of 20:
>
> en1nad
>
> and the compound word "Tannennadel" (fir needle). The regular pattern set
> has matches
>
> Tan-nen-na-del
>
> weighting each hyphenation point equally (10 or whatever). Compound word
> patterns find the match
>
> Tannen-nadel
>
> weighting that match 20. Finally, during paragraph breaking, hyphenation
> weights will be
>
> Tan-nen-na-del
> 10 20 10
>
> Therefore breaking the word at the word compound Tannen-nadel will be
> (slightly) preferred.
Thanks for the really nice outline. I didn't mean it too seriously,
but now I'll have a "problem" that I'll have to find a list of
preferred hyphenation points somewhere, while I don't even understand
our exact rules :)
It seems that at some point we'll have to start splitting
luatex-specific patterns (with advanced features) from regular ones
(which might already be the case - I have a feeling that Hungarian
might have an improved set of patterns that could be used in luatex
and only in luatex).
Mojca
More information about the tex-hyphen
mailing list