[tex-hyphen] Hyphenation patterns for Belarusian

Mon Aug 29 05:50:53 CEST 2016

Hi Arthur,

Thank you for detailed explanation. 

But unfortunately the test script doesn't work for me.
I tried it with TeXLive 2014.20141024-2 without success (unicode-letters.def is not shipped with it) and with the most recent vanilla version:

/usr/local/texlive/2016/bin/x86_64-linux/xetex -ini -etex test-hyph-be.tex
This is XeTeX, Version 3.14159265-2.6-0.99996 (TeX Live 2016) (INITEX)
 restricted \write18 enabled.
entering extended mode
(./test-hyph-be.tex
(/usr/local/texlive/2016/texmf-dist/tex/plain/config/unicode-letters.def
! Use of \XeTeXcheck doesn't match its definition.
<inserted text> .9
                  9996
l.64 ...ifnum\expandafter\XeTeXcheck\XeTeXrevision
                                                  .-\relax>996 %
? 
! Emergency stop.
<inserted text> .9
                  9996
l.64 ...ifnum\expandafter\XeTeXcheck\XeTeXrevision
                                                  .-\relax>996 %
No pages of output.
Transcript written on test-hyph-be.log.

My level of understanding of TeX is not enough to track down the cause from the message and sources.
Here is code XeTeX complains about (staring from the line 63):

    \def\XeTeXcheck.#1.#2-#3\relax{#1}
     \ifnum\expandafter\XeTeXcheck\XeTeXrevision.-\relax>996 %
       \def\XeTeXcheck#1{}
     \else
       \def\XeTeXcheck#1{%
          \ifnum"#1>"FFFF %
            \long\def\XeTeXcheck##1\endgroup{\endgroup}
            \expandafter\XeTeXcheck
          \fi
       }
     \fi

Best regards,
Maksim.

On Sun, 28 Aug 2016 15:12:48 +0100
Arthur Reutenauer <arthur.reutenauer at normalesup.org> wrote:

> 	Hi Maksim,
> 
>   First of all thank you for your efforts, although I would say you’re
> trying to do a little too much at this stage, I’ll explain why at the
> end.
> 
> > ! Conflicting pattern ignored.
> > l.6024 }
> >       
> > ? 
> > ! Emergency stop.
> > l.6024 }
> >       
> > !  ==> Fatal error occurred, no output PDF file produced!
> > Transcript written on luatex.log.
> > 
> > Is there any way to make it more verbose? Or debug the issue somehow?
> 
>   You can’t really make it more verbose with LuaTeX, but debugging the
> issue is easy: conflicting patterns (called “duplicate patterns” by
> XeTeX and other engines) are patterns where the underlying character
> strings are the same, for example a1b and a2b.  If you generate formats
> for XeTeX instead of LuaTeX, it gives you the exact line number where
> the offending pattern is found -- i. e., the second occurrence, which
> should help you find the first one.
> 
>   Using that technique I found a number of conflicts such as б1ь and
> б8ь, в1ь and в8ь, as well as а1й and а8й, а1ў and а8ў, and the more
> intriguing pairs 1’2а and ’3а, 1’2е and ’3е, etc.  This makes me suspect
> that the patterns haven’t been developed with great care.
> 
> > Also, please, clarify for me usage of quotes. There are 3 symbols used in hyph-be.tex: ' ` ’
> > I suspect this can confuse the engine, since generate-plain-patterns.rb checks only the first one and convert it to the third one to populate hyph-quote-<lang>.tex
> > What is the official position on quotes? Should one use only ' and *TeX will do the rest, or other symbols are allowed too?
> 
>   Any symbol is allowed in a hyphenation pattern for TeX as long as you
> set its \lccode correctly, which is done in a file called
> unicode-letters.def, or later within hyph-utf8.  If the characters don’t
> have a correct \lccode, you get an error from TeX saying “Non-letter”,
> and since you’re not reporting anything like that, your system seems to
> be set up correctly from that point of view.
> 
>   However, TeX won’t treat the different types of apostrophes in any
> special way, there are no equivalence tables or anything like that.  To
> the engine, the different Unicode characters for the apostrophe are
> simply that, different characters.  We enforce equivalences such as the
> one between ' and ’ by duplicating every pattern containing an
> apostrophe and putting it in the hyph-quote-* files as you’ve seen, so
> in your case we could do that by putting all patterns with ` and ’ in
> hyph-quote-be.tex, and the patterns with ' in the main file.  We can
> update the Ruby scripts to do that.
> 
>   The reason for having only one type of apostrophe in the main file
> (hyph-be.tex) is so that other programs that have a notion of
> equivalence won’t get confused; this is not about TeX (at least not
> about UTF-8 TeX, see below).
> 
> > And the third moment with these patterns is T2A encoding. The U+2019 symbol (the third quote from the list above) make conversion impossible, since the symbol is not mapped in converter. I tried to enable it in t2a.dat and regenerate converter, but it fails with message: The encoding t2a uses more than two bytes to encode characters.
> 
>   Yes, of course, in T2A there is only one character slot for the
> apostrophe, so you shouldn’t try and map all the different characters
> one-to-one.  This is precisely where the strategy explained in the
> paragraph above helps: if you extract all the different types of
> apostrophes to an auxiliary file and keep only one in the main file, you
> can work around that problem.  That said, do you really need to use the
> patterns in an 8-bit encoding?
> 
>   In conclusion, I think you should try and test the patterns first; you
> don’t need any of the machinery that hyph-utf8 provides, but for example
> just
> 
> ---- BEGIN test-hyph-be.tex
> \catcode`\{=1
> \catcode`\}=2
> \input unicode-letters.def
> \lccode`\'=`\'
> \lccode`\`=`\`
> \lccode`\’=`\’
> \input hyph-be
> % Your text here
> ---- END test-hyph-be-tex
> 
> to be compiled with “xetex -ini -etex test-hyph-be.tex”.  We’ll do the
> packaging later.
> 
> 	Best,
> 
> 		Arthur