[tex-hyphen] hyphenation (what else ;-)
arthur.reutenauer at normalesup.org
Mon May 17 08:36:48 CEST 2010
> I'm thinking about the idea that it might be clever to include a
> special chapter in the docs, describing those values for each
> language, in particular the "minimal" and "nice" values as you say.
There's also the issue that when patterns are generated by patgen,
those minimal values are really important, because otherwise you can
simply produce wrong hyphenation: if the patterns have been generated
with \righthyphenmin=3 and you set \righthyphenmin=2 for typesetting,
there's no guarantee the patterns will produce correct results. So it's
not only "theoretically" minimal.
(I'm glad I mentioned this before Peter Sojka reacted ;-)
In the case of French, the patterns have been mostly produced by hand,
so the problem may be slightly different.
> Another issue is Sanskrit and Indic scripts where Yves Codet argued
> that hyphenmin can be as little as 1 and he tried to eliminate other
> options with suitable patterns.
Right, and I thought we had corrected that... (note that they have
been produced by hand too).
> But such rules (forbidding the hyphenation before -le) should probably
> be part of patterns already, not necessary handled with hyphenmin
> But there has been a pretty lengthy discussion about French already.
... without any convincing conclusion, if I remember it correctly.
> Oh, and yes ... I was already wondering when somebody will come up
> with the idea to extend TeX with tolerances for preferable breaking
> points in addition to the allowed ones :) :) :)
I think David Kastrup already toyed with such an idea. He mentioned
it on the dev-luatex mailing-list one or two years ago.
> It's kind of a flaw in TeX - nothing gets hyphenated at all since
> Knuth didn't forsee that situation (and it's definitely not something
> that you would want to mimick). The same is true for composed words
> that don't get hyphenated at all unless some extra patterns are added.
> On the other hand if one sets the lefthyphenmin to 2 or 3 and sets
> lccode of apostrophe ... and then TeX determines that it's ok to break
> between i3ni, TeX will happily hyphenate l'i-ni-tia-tion even if
> breaking after the first i in i-ni-tia-tion is forbidden.
It's definitely a hard problem, not one that you can one solve by
simply assigning categories to characters. The point with the
apostrophe is that it can be used both as a character to mark elision
between two separate words, and as part of a single word (in particular
in French, but in other languages too). You need to do computational
linguistics to solve the issue fully.
More information about the tex-hyphen