[tex-hyphen] Hyphenation patterns for Belarusian
Maksim Salau
maksim.salau at gmail.com
Fri Sep 16 23:49:54 CEST 2016
Hi Arthur,
> I would use my library Hydra:
>
> -- BEGIN --
> require 'hydra'
>
> hydra = Hydra.new
> hydra.setlefthyphenmin(2)
> hydra.setrighthyphenmin(2)
> hydra.ingest_file('/path/to/patterns.txt')
> File.read('/path/to/wordlist.txt').each_line do |line|
> puts hydra.showhyphens(line.strip)
> end
> -- END --
Thanks! I've decided to use your library and ended up with almost the same script
https://github.com/msalau/hyph-be/blob/master/showhyphens.rb
It accepts words either as arguments or from stdin.
> > Also, is there any easy way to prohibit hyphenation of consonant-only endings/beginnings of a word?
> > I can remember a word with 3 consonants at the end.
> > Is generation of .ccc8 8ccc. patterns the only way to go? (patterns for 2 consonants are already in place)
>
> Yes, and I would restrict that to lists of three consonants that
> actually do occur in Belarusian.
This is the hard part :) All combinations (both possible and impossible) take really huge amount of space. I'm considering parsing the hunspell dictionary to get only possible combinations.
>
> > Licensing question is still open. I failed to contact Sviatlana and Alex answered nothing about switching to the MIT license.
>
> I’ve seen you’ve made progress in the mean time from your private
> emails; however I’d like to mention that from what I see in your working
> repository, you have actually reimplemented the whole file from the
> specifications of the Belarusian Academy. It is thus almost certain
> that you can rightfully call yourself the only copyright holder of the
> file; the only caveat is the list of 23 words you’ve copied from the
> OpenOffice file (whose author must be Sviatlana Liasovich since they are
> not in the LibreOffice file by Alex Buloichik). However it is doubtful
> that one can really hold a copyright on a list of 23 words or substrings ...
>
> That said, it is always courteous to acknowledge the contribution of
> previous developers, but I wouldn’t put their names in the copyright
> line.
Thanks. I'll reconsider that part.
Recently I tried patterns with real TeX documents: https://github.com/msalau/hyph-be/tree/master/test-doc
I have 2 variants: for T2A encoding (compiled with pdflatex) and for UTF-8 (compiled with xelatex).
Here is what I've got:
1. document.t2a.tex is an UTF-8 document, uses babel (Belarusian is supported) and T2A encoding and is compiled with pdflatex.
Hyphenation works, but output from \showhyphens{} is unreadable:
> [] \T2A/cmr/m/n/10 ��-�-���� ��-��-����!
2. document.tex is an UTF-8 document, uses polyglossia and is compiled with xelatex.
Polyglossia doesn't support Belarusian.
Hyphenation doesn't work but output from \showhyphens{} is readable.
> Package polyglossia Warning: File gloss-belarusian.ldf does not exist!
> (polyglossia) I will nevertheless try to use hyphenation patterns for belarusian. on input line 7.
> Underfull \hbox (badness 10000) in paragraph at lines 10--10
> [] \EU1/DejaVuSans(0)/m/n/10 Тэставы дакумент
If polyglossia is replaced with babel, hyphenation starts to work in the document, but not in log:
> Package babel Warning: No input encoding specified for Belarusian language on input line 146.
> ))
> Underfull \hbox (badness 10000) in paragraph at lines 10--10
> [] \EU1/DejaVuSans(0)/m/n/10 Тэставы дакумент
Polyglossia is actually aware of presence of hyphenation patterns for Belarusian, since it doesn't complain much.
Here is how it complains about totally unknown language:
> Package polyglossia Warning: File gloss-foo-bar.ldf does not exist!
> (polyglossia) I will nevertheless try to use hyphenation patterns for foo-bar. on input line 7.
> Package polyglossia Warning: \setlocalhyphenmin useless for unknown language foo-bar on input line 7.
> Package polyglossia Warning: No hyphenation patterns were loaded for `Foo-bar'
> (polyglossia) I will use \language=\l at nohyphenation instead on input line 7.
Does anyone have any ideas about polyglossia? As I can see, polyglossia can access hyphenation patterns
(it know that they exist), but fails to load them for unknown reason.
Best regards,
Maksim.
More information about the tex-hyphen
mailing list