[tex-hyphen] Hyphenation patterns for Belarusian

Maksim Salau maksim.salau at gmail.com
Fri Sep 16 23:49:54 CEST 2016

Hi Arthur,

>   I would use my library Hydra:
> -- BEGIN --
> require 'hydra'
> hydra = Hydra.new
> hydra.setlefthyphenmin(2)
> hydra.setrighthyphenmin(2)
> hydra.ingest_file('/path/to/patterns.txt')
> File.read('/path/to/wordlist.txt').each_line do |line|
>   puts hydra.showhyphens(line.strip)
> end
> -- END --

Thanks! I've decided to use your library and ended up with almost the same script
It accepts words either as arguments or from stdin.

> > Also, is there any easy way to prohibit hyphenation of consonant-only endings/beginnings of a word?
> > I can remember a word with 3 consonants at the end.
> > Is generation of .ccc8 8ccc. patterns the only way to go? (patterns for 2 consonants are already in place)
>   Yes, and I would restrict that to lists of three consonants that
> actually do occur in Belarusian.

This is the hard part :) All combinations (both possible and impossible) take really huge amount of space. I'm considering parsing the hunspell dictionary to get only possible combinations.

> > Licensing question is still open. I failed to contact Sviatlana and Alex answered nothing about switching to the MIT license.
>   I’ve seen you’ve made progress in the mean time from your private
> emails; however I’d like to mention that from what I see in your working
> repository, you have actually reimplemented the whole file from the
> specifications of the Belarusian Academy.  It is thus almost certain
> that you can rightfully call yourself the only copyright holder of the
> file; the only caveat is the list of 23 words you’ve copied from the
> OpenOffice file (whose author must be Sviatlana Liasovich since they are
> not in the LibreOffice file by Alex Buloichik).  However it is doubtful
> that one can really hold a copyright on a list of 23 words or substrings ...
>   That said, it is always courteous to acknowledge the contribution of
> previous developers, but I wouldn’t put their names in the copyright
> line.

Thanks. I'll reconsider that part.

Recently I tried patterns with real TeX documents: https://github.com/msalau/hyph-be/tree/master/test-doc
I have 2 variants: for T2A encoding (compiled with pdflatex) and for UTF-8 (compiled with xelatex).

Here is what I've got:

1. document.t2a.tex is an UTF-8 document, uses babel (Belarusian is supported) and T2A encoding and is compiled with pdflatex.
Hyphenation works, but output from \showhyphens{} is unreadable:

> [] \T2A/cmr/m/n/10 ��-�-���� ��-��-����!

2. document.tex is an UTF-8 document, uses polyglossia and is compiled with xelatex.
Polyglossia doesn't support Belarusian.
Hyphenation doesn't work but output from \showhyphens{} is readable.

> Package polyglossia Warning: File gloss-belarusian.ldf does not exist!
> (polyglossia)                I will nevertheless try to use hyphenation patterns for belarusian. on input line 7.
> Underfull \hbox (badness 10000) in paragraph at lines 10--10
> [] \EU1/DejaVuSans(0)/m/n/10 Тэставы дакумент

If polyglossia is replaced with babel, hyphenation starts to work in the document, but not in log:

> Package babel Warning: No input encoding specified for Belarusian language on input line 146.
> ))
> Underfull \hbox (badness 10000) in paragraph at lines 10--10
> [] \EU1/DejaVuSans(0)/m/n/10 Тэставы дакумент

Polyglossia is actually aware of presence of hyphenation patterns for Belarusian, since it doesn't complain much.
Here is how it complains about totally unknown language:

> Package polyglossia Warning: File gloss-foo-bar.ldf does not exist!
> (polyglossia)                I will nevertheless try to use hyphenation patterns for foo-bar. on input line 7.

> Package polyglossia Warning: \setlocalhyphenmin useless for unknown language foo-bar on input line 7.

> Package polyglossia Warning: No hyphenation patterns were loaded for `Foo-bar'
> (polyglossia)                I will use \language=\l at nohyphenation instead on input line 7.

Does anyone have any ideas about polyglossia? As I can see, polyglossia can access hyphenation patterns
(it know that they exist), but fails to load them for unknown reason.

Best regards,

More information about the tex-hyphen mailing list