[tex-hyphen] new Catalan hyphenation patterns

Mojca Miklavec mojca.miklavec.lists at gmail.com
Mon Feb 18 11:18:16 CET 2013


Dear Jaume,

I have some more questions related to your submission of new patterns
for Catalan. (I have to clarify a few points to justify replacing the
default hyphenation patterns.)

1.) Did you also test these patterns with TeX and what other tests did
you perform on these patterns? I'm asking because I've spotted some
potential problems in the file (with eyes, not by testing the
patterns).

2.) Is the list of hyphenated words public? If so, may we include the
list of words into our repository to simplify testing of the patterns
now and in the future? Also: do you have any (possibly publicly
available) grammatical excerpt explaining hyphenation rules for
Catalan? (Gonçal also mentioned the use of a dictionary.) Any
dictionary would be helpful.

3.) There are some dashes (-) in your patterns. Can you please explain
a bit its usage in your grammar? I'm asking because dashes are tricky
in a way and if we want to treat dashes as letters, this might have
some unexpected consequences, so it needs to be done right and with
sufficient testing. Office might work differently in that respect (I'm
not sure though), so the testing would also have to be done in TeX.
Some languages like Russian include the dashes, but they include *a
lot* of simple patterns with dashes. In other languages hyphenation of
composed words can be done in a different way (at least in TeX, I'm
not sure how office deals with the problem).

4.) Lots of patterns like ".de3s4ar." are in essence just "hyphenation
exceptions". (But since Open/LibreOffice doesn't allow specifying
hyphenation exceptions, I understand the reasoning behind this form of
specifying exceptions.)

5.) Did you try to play with patgen? (It's not necessary, I'm just
curious.) That one might possibly greatly reduce the number of
additional patterns.

> (left,right)-hyphenmin can be now (1,1), which allows valid hyphenations
> like "l'e-mulació", although "e-mulació" is generally undesirable and
> avoided.

6.) I didn't try it yet, but how would your patterns behave on the
word "emulació" alone? I didn't understand from this sentence how
exactly such "problems" are dealt with in your patterns when
lefthyphenmin is set to 1. Would the word be hyphenated as e-mulació,
with the argument being that this word would never occur alone?
(Patterns for many languages treat apostophe as a letter, but then
again a whole lot of new patterns are needed to properly account for
that. Having apostrophe in patterns is equally problematic as dashes.)

Thank you,
    Mojca

On Wed, Dec 19, 2012 at 9:52 AM, Jaume Ortolà i Font
<jaumeortola at gmail.com> wrote:
> Hi,
>
> I have created a new Catalan hyphenation file (see attachment).
>
> The new patterns have been checked against a dictionary (Gran Diccionari de
> la Llengua Catalana, Enciclopèdia Catalana, www.diccionari.cat), with 100%
> valid results. The patterns cover all the exceptions, except for one word
> that can be hyphenated in two ways depending on its meaning (àcid
> per-iò-dic, un pe-ri-ò-dic). There remain a dozen rare words that are
> unclear even for the mentioned dictionary redactors, which I have consulted.
>
> (left,right)-hyphenmin can be now (1,1), which allows valid hyphenations
> like "l'e-mulació", although "e-mulació" is generally undesirable and
> avoided.
>
> The current Catalan hyphenation file contains 895 patterns, and the new one
> 1499.
>
> Please, mention my contribution as "Jaume Ortolà i Font, 2012
> (www.riuraueditors.cat), jaumeortola at gmail.com".
>
> These patterns are already being distributed in other formats
> (Libre/OpenOffice).
>
> Regards,
> Jaume Ortolà
> www.riuraueditors.cat



More information about the tex-hyphen mailing list