[tex-hyphen] hyphenation for Bulgarian language

Anton Zinoviev anton at lml.bas.bg
Sat Oct 21 22:57:22 CEST 2017


[I am sending a CC of this message to Georgi Boshnakov and Stoyan Dimitrov]

Hello everybody,

As far as I understand, in order to accept new hyphenation patterns
you need:

1. to have a permission to distribute them with free license;
2. to have them encoded in UTF-8;
3. since the patterns are generated algorithmically, to have the
   script which generates them;
4. to have an analysis of the differences between the new patterns and
   the existing patterns by Mr. Boshnakov;
5. to have the opinion of Mr. Boshnakov about the new patterns.

I believe we were able to satisfy all these requirements.  You already 
got a message by Mr. Boshnakov.  And at the end of this message you 
will find url adresses that you can use in order to download a shell 
script `hyph-bg.sh` with a permissible license which can be used in
the following ways.

    hyph-bg.sh --help

This will print a short usage instructions.

    hyph-bg.sh --doc-txt

This will generate (on the standard output) a text about the Bulgarian
hyphenation, including an analysis of the differences between the
Bulgarian hyphenation patterns by Mr. Boshnakov and the proposed new
hyphenation patterns.

If the system you use has pandoc installed, then you can also use one
of the following options in order to have an easier to read document:

    hyph-bg.sh --doc-html
    hyph-bg.sh --doc-latex

In order to generate Bulgarian hyphenation patterns for TeX, the
following options should be used:

    hyph-bg.sh --safe-morphology --standalone-tex

Both the left and the right hyphen mins are 2.

One important difference between the line-breaking algorithm used by
TeX and the line-breaking algorithm used by most other software is
that the algorithm of TeX is smart and can produce perfect results
even with fewer hyphenation possibilities.  Because of this, with TeX
it makes sense to use hyphenation patterns which separate the words
only in the preferred positions.  On the other hand, with software
using dumb line-breaking algorithm, it is perhaps preferable to use
hyphenation patterns which provide more hyphenation possibilities.

If it is possible to provide two different sets of the Bulgarian
hyphenation patterns, then the other software (not TeX!) should use
patterns produced in the following way:

    hyph-bg.sh --no-hyphen-mins

(The option --no-hyphen-mins is because the current versions of Mozilla 
ignore the hyphen mins in words containing a dash.)

The following are url addresses you can use in order to download the
script `hyph-bg.sh` and the results produced by it.

The script itself:

    http://logic.fmi.uni-sofia.bg/hyphenation/hyph-bg.sh

Documentation about the Bulgarian hyphenation:

    http://logic.fmi.uni-sofia.bg/hyphenation/hyph-bg.html

The same in format PDF:

    http://logic.fmi.uni-sofia.bg/hyphenation/hyph-bg.pdf

Hyphenation patterns for TeX:

    http://logic.fmi.uni-sofia.bg/hyphenation/hyph-bg.tex

Hyphenation patterns for other software:

   http://logic.fmi.uni-sofia.bg/hyphenation/hyph-bg.other

Regards,
Anton Zinoviev


More information about the tex-hyphen mailing list