[tex-hyphen] weighting hyphenation points
Stephan Hennig
mailing_list at arcor.de
Wed May 19 12:00:22 CEST 2010
Am 18.05.2010 21:12, schrieb Mojca Miklavec:
> On Mon, May 17, 2010 at 13:01, Stephan Hennig wrote:
>>
>> Therefore breaking the word at the word compound Tannen-nadel will be
>> (slightly) preferred.
>
> Thanks for the really nice outline. I didn't mean it too seriously,
> but now I'll have a "problem" that I'll have to find a list of
> preferred hyphenation points somewhere, while I don't even understand
> our exact rules :)
Along the same lines valid, but undesirable hyphenations can be
suppressed. While we currently forbid undesirable hyphenations
hard-coded in the regular patterns, like
allowed undesirable
An-alpha-bet Anal-phabet
Tal-entwäs-se-rung Talent-wässerung
Text-illus-tra-ti-on Textil-lustration
those hyphenations can as well be forbidden by a special pattern set
with a very high corresponding weight. As a use-case, it might be
necessary to allow for even undesirable hyphenations when setting text
in small columns, which is impossible with hard-coded suppression.
We don't use fixed rules to categorize undisirable hyphenations, but
just use common-sense. This is perhaps easier than categorizing
preferred hyphenations for other languages as well.
> It seems that at some point we'll have to start splitting
> luatex-specific patterns (with advanced features) from regular ones
Probably. Having multiple pattern sets applied in parallel opens the
door for automatic application of non-standard hyphenation, long-s (for
black-letter fonts) and intelligent ligature recognition. I don't know
which of the corresponding pattern sets hyph-utf8 wants to manage. But
Polyglossia should probably be able to deal with all that in the future.
Unfortunately, we don't have a Polyglossia hacker in our team to
contribute more than pipe dreams. :( (Well, we can contribute test
patterns.)
Best regards,
Stephan Hennig
More information about the tex-hyphen
mailing list