[tex-hyphen] Hyphenation patterns for Belarusian
arthur.reutenauer at normalesup.org
Fri Sep 16 22:28:29 CEST 2016
Sorry for the late answer, this may still be useful:
> Do you know a way that doesn't involve system-wide installation of patterns and generation of a full-blown *TeX document?
> I had in mind something like this:
> % -- BEGIN --
> \input unicode-letters
> \input hyph-be
> % Some TeX magic
> \input word-list.txt
> % Some more TeX magic
> % -- END --
> And result printed to standard output.
I would use my library Hydra:
-- BEGIN --
hydra = Hydra.new
File.read('/path/to/wordlist.txt').each_line do |line|
-- END --
> I got such work-flow with https://github.com/hunspell/hyphen
> But I'm not sure if this library produces the same output as *TeX does.
I'm sure it doesn't ;-) It was designed to provide a functionality
equivalent to TeX’s hyphenation routine but has a slightly different
implementation; in the general case you can’t assume that the same set
of patterns will yield the same hyphenation points with TeX and
libhyphen. (One could even say it uses a slightly different algorithm
based on the same ideas.) In addition there are slight variations in
the configuration that may explain what you observe:
> And here is my first result that works not as expected:
> left hyphen min = right hyphen min = 2
> а б а б ' ю
> So I expect to get аба=б'ю, but the library says there are no valid hyphenation points in the word.
> Adding rule .аба3б doesn't help. The only explanation I can imagine is that the library doesn't treat quote as part of the alphabet and hyphenates the word as two separate words. The word абаб'юц=ца proves that theory.
That's a very likely explanation indeed, and I would call this kind of
problem a configuration issue; in addition the fact that the patterns
need “preparation” for use with libhyphen does mean that you can’t use
your patterns with it and expect the same hyphenation as TeX. See
README.hyphen in the libhyphen repository for details.
> Does anyone have any experience with the library?
Not much more than the theoretical knowledge I summarise above, to be
honest (and one test case that made the problem evident), but I’d still
recommend using my library, and reporting any potential difference you
notice, because then it’s a bug and I’d like to fix it :-) You can of
course report any other issue, if applicable.
More information about the tex-hyphen