# [tex-hyphen] String preparation

Arthur Reutenauer arthur.reutenauer at normalesup.org
Wed May 25 21:09:56 CEST 2016

> I think the main point is that it is not treated the same way as œ. For
> instance with right/lefthyphenmin = 2, we have
>
> œ́-di-pus
> œdi-pus

If that’s what the patterns do, they should be fixed, and \left and
\righthyphenmin set to something more useful.  That’s what we have for
Sanskrit, for example, where notionally TeX could break after any
grapheme at the beginning of a word, but we can’t specify that easily,
so \lefthyphenmin is set to 1 and there are patterns of (if I remember
correctly) up to 5 characters to prevent graphemes to be broken up.

> If I understand correctly, you're saying that in order to have œ treated
> as 2 characters (with lefthyphenmin=2), I should have lefthyphenmin=1
> and completely redo the patterns so that all the sequences of two
> characters at the beginning of a word do not get separated... If I
> understood correctly, that just looks terrible...

That’s what should have be done in the first place, of course.  Many
pattern sets have been generated for some specific values of \left and
\righthyphenmin and can’t be used with lower values because that could
cause wrong hyphenation.  This has been the case since Liang created
hyphen.tex.  This is particularly important when using patgen because it
expects a default value for hyphenmins, and will ignore whatever happens
below these thresholds (hence bad breaks could be generated); to be
honest I’m a little surprised that patterns generated by hand aren’t
made to simply work with any values of \left and \righthyphenmins (and
likewise using patgen nowadays should really best be done with all
values set to 0).

Best,

Arthur