[tex-hyphen] hyphenation for Bulgarian language
Anton Zinoviev
anton at lml.bas.bg
Sat Oct 21 22:57:22 CEST 2017
[I am sending a CC of this message to Georgi Boshnakov and Stoyan Dimitrov]
Hello everybody,
As far as I understand, in order to accept new hyphenation patterns
you need:
1. to have a permission to distribute them with free license;
2. to have them encoded in UTF-8;
3. since the patterns are generated algorithmically, to have the
script which generates them;
4. to have an analysis of the differences between the new patterns and
the existing patterns by Mr. Boshnakov;
5. to have the opinion of Mr. Boshnakov about the new patterns.
I believe we were able to satisfy all these requirements. You already
got a message by Mr. Boshnakov. And at the end of this message you
will find url adresses that you can use in order to download a shell
script `hyph-bg.sh` with a permissible license which can be used in
the following ways.
hyph-bg.sh --help
This will print a short usage instructions.
hyph-bg.sh --doc-txt
This will generate (on the standard output) a text about the Bulgarian
hyphenation, including an analysis of the differences between the
Bulgarian hyphenation patterns by Mr. Boshnakov and the proposed new
hyphenation patterns.
If the system you use has pandoc installed, then you can also use one
of the following options in order to have an easier to read document:
hyph-bg.sh --doc-html
hyph-bg.sh --doc-latex
In order to generate Bulgarian hyphenation patterns for TeX, the
following options should be used:
hyph-bg.sh --safe-morphology --standalone-tex
Both the left and the right hyphen mins are 2.
One important difference between the line-breaking algorithm used by
TeX and the line-breaking algorithm used by most other software is
that the algorithm of TeX is smart and can produce perfect results
even with fewer hyphenation possibilities. Because of this, with TeX
it makes sense to use hyphenation patterns which separate the words
only in the preferred positions. On the other hand, with software
using dumb line-breaking algorithm, it is perhaps preferable to use
hyphenation patterns which provide more hyphenation possibilities.
If it is possible to provide two different sets of the Bulgarian
hyphenation patterns, then the other software (not TeX!) should use
patterns produced in the following way:
hyph-bg.sh --no-hyphen-mins
(The option --no-hyphen-mins is because the current versions of Mozilla
ignore the hyphen mins in words containing a dash.)
The following are url addresses you can use in order to download the
script `hyph-bg.sh` and the results produced by it.
The script itself:
http://logic.fmi.uni-sofia.bg/hyphenation/hyph-bg.sh
Documentation about the Bulgarian hyphenation:
http://logic.fmi.uni-sofia.bg/hyphenation/hyph-bg.html
The same in format PDF:
http://logic.fmi.uni-sofia.bg/hyphenation/hyph-bg.pdf
Hyphenation patterns for TeX:
http://logic.fmi.uni-sofia.bg/hyphenation/hyph-bg.tex
Hyphenation patterns for other software:
http://logic.fmi.uni-sofia.bg/hyphenation/hyph-bg.other
Regards,
Anton Zinoviev
More information about the tex-hyphen
mailing list