[tex-hyphen] Hyphenation patterns for Belarusian
maksim.salau at gmail.com
Mon Aug 29 05:50:53 CEST 2016
Thank you for detailed explanation.
But unfortunately the test script doesn't work for me.
I tried it with TeXLive 2014.20141024-2 without success (unicode-letters.def is not shipped with it) and with the most recent vanilla version:
/usr/local/texlive/2016/bin/x86_64-linux/xetex -ini -etex test-hyph-be.tex
This is XeTeX, Version 3.14159265-2.6-0.99996 (TeX Live 2016) (INITEX)
restricted \write18 enabled.
entering extended mode
! Use of \XeTeXcheck doesn't match its definition.
<inserted text> .9
! Emergency stop.
<inserted text> .9
No pages of output.
Transcript written on test-hyph-be.log.
My level of understanding of TeX is not enough to track down the cause from the message and sources.
Here is code XeTeX complains about (staring from the line 63):
On Sun, 28 Aug 2016 15:12:48 +0100
Arthur Reutenauer <arthur.reutenauer at normalesup.org> wrote:
> Hi Maksim,
> First of all thank you for your efforts, although I would say you’re
> trying to do a little too much at this stage, I’ll explain why at the
> > ! Conflicting pattern ignored.
> > l.6024 }
> > ?
> > ! Emergency stop.
> > l.6024 }
> > ! ==> Fatal error occurred, no output PDF file produced!
> > Transcript written on luatex.log.
> > Is there any way to make it more verbose? Or debug the issue somehow?
> You can’t really make it more verbose with LuaTeX, but debugging the
> issue is easy: conflicting patterns (called “duplicate patterns” by
> XeTeX and other engines) are patterns where the underlying character
> strings are the same, for example a1b and a2b. If you generate formats
> for XeTeX instead of LuaTeX, it gives you the exact line number where
> the offending pattern is found -- i. e., the second occurrence, which
> should help you find the first one.
> Using that technique I found a number of conflicts such as б1ь and
> б8ь, в1ь and в8ь, as well as а1й and а8й, а1ў and а8ў, and the more
> intriguing pairs 1’2а and ’3а, 1’2е and ’3е, etc. This makes me suspect
> that the patterns haven’t been developed with great care.
> > Also, please, clarify for me usage of quotes. There are 3 symbols used in hyph-be.tex: ' ` ’
> > I suspect this can confuse the engine, since generate-plain-patterns.rb checks only the first one and convert it to the third one to populate hyph-quote-<lang>.tex
> > What is the official position on quotes? Should one use only ' and *TeX will do the rest, or other symbols are allowed too?
> Any symbol is allowed in a hyphenation pattern for TeX as long as you
> set its \lccode correctly, which is done in a file called
> unicode-letters.def, or later within hyph-utf8. If the characters don’t
> have a correct \lccode, you get an error from TeX saying “Non-letter”,
> and since you’re not reporting anything like that, your system seems to
> be set up correctly from that point of view.
> However, TeX won’t treat the different types of apostrophes in any
> special way, there are no equivalence tables or anything like that. To
> the engine, the different Unicode characters for the apostrophe are
> simply that, different characters. We enforce equivalences such as the
> one between ' and ’ by duplicating every pattern containing an
> apostrophe and putting it in the hyph-quote-* files as you’ve seen, so
> in your case we could do that by putting all patterns with ` and ’ in
> hyph-quote-be.tex, and the patterns with ' in the main file. We can
> update the Ruby scripts to do that.
> The reason for having only one type of apostrophe in the main file
> (hyph-be.tex) is so that other programs that have a notion of
> equivalence won’t get confused; this is not about TeX (at least not
> about UTF-8 TeX, see below).
> > And the third moment with these patterns is T2A encoding. The U+2019 symbol (the third quote from the list above) make conversion impossible, since the symbol is not mapped in converter. I tried to enable it in t2a.dat and regenerate converter, but it fails with message: The encoding t2a uses more than two bytes to encode characters.
> Yes, of course, in T2A there is only one character slot for the
> apostrophe, so you shouldn’t try and map all the different characters
> one-to-one. This is precisely where the strategy explained in the
> paragraph above helps: if you extract all the different types of
> apostrophes to an auxiliary file and keep only one in the main file, you
> can work around that problem. That said, do you really need to use the
> patterns in an 8-bit encoding?
> In conclusion, I think you should try and test the patterns first; you
> don’t need any of the machinery that hyph-utf8 provides, but for example
> ---- BEGIN test-hyph-be.tex
> \input unicode-letters.def
> \input hyph-be
> % Your text here
> ---- END test-hyph-be-tex
> to be compiled with “xetex -ini -etex test-hyph-be.tex”. We’ll do the
> packaging later.
More information about the tex-hyphen