[tex-hyphen] German hyphenation of "Methode"

Thu Apr 9 21:26:58 CEST 2015

On Thu, Apr 09, 2015 at 10:19:37AM +0200, Pander wrote:
> On 04/09/2015 10:04 AM, Élie Roux wrote:
> >> I think it's a very reasonable suggestion. I guess it wasn't in TeX 
> >> originally because patgen could handle this, language wide. 
> > 
> > I'm sorry, I'm no expert in the field... I admit I don't really
> > understand what you mean by that? Do you mean that patgen can generate
> > exceptions for words < x letters?
> > 
> >> A more nuanced approach (which might not be too much additional work):
> >>
> >> \shorthyphenpenalty : A hyphenation penalty to apply for short words, 
> >> 	defaulting to something big but not impossible.
> >> \shorthyphenlen    : length of a word to consider short.
> >>
> >> I can see this being a useful approach - it doesn't hyphenate short words
> >> normally, but it *can* if the alternative is really terrible (e.g. a small
> >> column which would otherwise be an overfull box).
> >>
> >> \shorthyphenlen would default to 1 so that existing documents are unaffected.
> > 
> > A very interesting idea, I'll add it on
> > 
> > http://tracker.luatex.org/view.php?id=930

I second this approach, because
a) it is backward compatible
b) it allows for small patterns
c) it may be implemented very effectively

Re b): For generating patterns by patgen2 (or opatgen)
I have always suggested generating patterns with \lefthyphenmin=1
and \righthyphenmin=1 as this setting will allow patgen generalize
language `syllables' by short and smaller number of patterns
compared to the setup where most of shor words and prefixes and 
suffixes should be covered by longer patterns as `exceptions' 
to general syllabification rules/patterns (CV- CCV- etc.).

Re c): it allows paragraph algorithm simply not to call
the hyphenation routine for a word is shorter than \shorthyphenlen
when \shorthyphenpenalty is greater than 10000.

> I have been working on an international standard for hyphenation pattern
> definitions. The draft can be found here
> https://github.com/OpenTaal/hyphenation-definitions For that I have been
> collecting many real world examples for which patgen is not suitable.
> This will form the specifications for a next generation patgen. A
> community on whos work this RFC is based is already working better
> hyphenation. If you need more info, pleaes contact me.
I am interested. Especially in an algorithm that compiles so many different
types of segmentation from wordlist into one set of patterns.  Is there one
known?
I've shown already in 1995 that one can have patterns for 
some/every type of segmentation (compound, dynamic), 
but that would imply having dozens of patterns for every
language.

Thank you!
Petr Sojka