[tex-hyphen] new Catalan hyphenation patterns

Mojca Miklavec mojca.miklavec.lists at gmail.com
Fri Feb 22 10:52:11 CET 2013


On Mon, Feb 18, 2013 at 2:34 PM, Jaume Ortolà i Font wrote:
>
>> 3.) There are some dashes (-) in your patterns. Can you please explain
>> a bit its usage in your grammar? I'm asking because dashes are tricky
>> in a way and if we want to treat dashes as letters, this might have
>> some unexpected consequences, so it needs to be done right and with
>> sufficient testing. Office might work differently in that respect (I'm
>> not sure though), so the testing would also have to be done in TeX.
>> Some languages like Russian include the dashes, but they include *a
>> lot* of simple patterns with dashes. In other languages hyphenation of
>> composed words can be done in a different way (at least in TeX, I'm
>> not sure how office deals with the problem).
>
> I remember the problem. In Catalan there are infinitives and gerunds
> terminated in -ir and -int that should be written with diaeresis in order to
> divide correctly the syllables. For example: con_tri_bu_ïr, con_tri_bu_ïnt.
> But, exceptionally, according to the orthographical rules, this diaeresis is
> omitted: con_tri_bu_ir, con_tri_bu_int. So we need patterns like "u1ir."
> (but "qu4ir."). Moreover, infinitives and gerunds can be united to a pronoun
> with a hyphen: contribuir-hi, contribuint-hi. As LibreOffice considers the
> hyphen a word character, I had to add these patterns with hyphens ("u1ir-")
>
> In Catalan, the hyphen or dash character is always a hyphenation point, and
> the apostrophe is never a hyphenation point. Are there potential problems on
> these issues?

For the dash: I didn't understand the usage of these (and some other) patterns:

u1ir- qu4ir- gu4ir- u1int- qu4int- gu4int-
e1ir- e1int- a1ir- a1int- o1ir- o1int-

In TeX, the dash is generally not considered a letter and is treated
as a word separator. As soon as dash is introduced into patterns, it
might need slightly different, but in any case a complete handling to
account for cases like "C-vitamin", "γ-ray", ... and other compound
words (two words separated by a dash). It can quickly happen that you
would end up hyphenating something like "C-v<hyphenation>itamin" (or,
an artificial example: "C-e<hyphenation>mulatió" if "emulatió" meant a
vitamin - I know it doesn't, but without speaking the language I
cannot come up with some realistic examples). This kind of breaking is
probably not desirable.

See the following for example:
http://tug.org/svn/texhyphen/trunk/hyph-utf8/tex/generic/hyph-utf8/patterns/tex/hyph-uk.tex?revision=567&view=markup
for a complete case of rules for handling a dash. (Other European
languages also have problems with compound words.)

For the apostrophe I'll write separately.

Mojca



More information about the tex-hyphen mailing list