[tex-hyphen] weighting hyphenation points

Stephan Hennig mailing_list at arcor.de
Wed May 19 12:00:22 CEST 2010


Am 18.05.2010 21:12, schrieb Mojca Miklavec:
> On Mon, May 17, 2010 at 13:01, Stephan Hennig wrote:
>>
>> Therefore breaking the word at the word compound Tannen-nadel will be
>> (slightly) preferred.
>
> Thanks for the really nice outline. I didn't mean it too seriously,
> but now I'll have a "problem" that I'll have to find a list of
> preferred hyphenation points somewhere, while I don't even understand
> our exact rules :)

Along the same lines valid, but undesirable hyphenations can be 
suppressed.  While we currently forbid undesirable hyphenations 
hard-coded in the regular patterns, like

     allowed                   undesirable

   An-alpha-bet              Anal-phabet
   Tal-entwäs-se-rung        Talent-wässerung
   Text-illus-tra-ti-on      Textil-lustration

those hyphenations can as well be forbidden by a special pattern set 
with a very high corresponding weight.  As a use-case, it might be 
necessary to allow for even undesirable hyphenations when setting text 
in small columns, which is impossible with hard-coded suppression.

We don't use fixed rules to categorize undisirable hyphenations, but 
just use common-sense.  This is perhaps easier than categorizing 
preferred hyphenations for other languages as well.


> It seems that at some point we'll have to start splitting
> luatex-specific patterns (with advanced features) from regular ones

Probably.  Having multiple pattern sets applied in parallel opens the 
door for automatic application of non-standard hyphenation, long-s (for 
black-letter fonts) and intelligent ligature recognition.  I don't know 
which of the corresponding pattern sets hyph-utf8 wants to manage.  But 
Polyglossia should probably be able to deal with all that in the future. 
  Unfortunately, we don't have a Polyglossia hacker in our team to 
contribute more than pipe dreams. :(  (Well, we can contribute test 
patterns.)

Best regards,
Stephan Hennig


More information about the tex-hyphen mailing list