[tex-hyphen] Hyphenation patterns for Belarusian
Maksim Salau
maksim.salau at gmail.com
Mon Aug 29 05:50:53 CEST 2016
Hi Arthur,
Thank you for detailed explanation.
But unfortunately the test script doesn't work for me.
I tried it with TeXLive 2014.20141024-2 without success (unicode-letters.def is not shipped with it) and with the most recent vanilla version:
/usr/local/texlive/2016/bin/x86_64-linux/xetex -ini -etex test-hyph-be.tex
This is XeTeX, Version 3.14159265-2.6-0.99996 (TeX Live 2016) (INITEX)
restricted \write18 enabled.
entering extended mode
(./test-hyph-be.tex
(/usr/local/texlive/2016/texmf-dist/tex/plain/config/unicode-letters.def
! Use of \XeTeXcheck doesn't match its definition.
<inserted text> .9
9996
l.64 ...ifnum\expandafter\XeTeXcheck\XeTeXrevision
.-\relax>996 %
?
! Emergency stop.
<inserted text> .9
9996
l.64 ...ifnum\expandafter\XeTeXcheck\XeTeXrevision
.-\relax>996 %
No pages of output.
Transcript written on test-hyph-be.log.
My level of understanding of TeX is not enough to track down the cause from the message and sources.
Here is code XeTeX complains about (staring from the line 63):
\def\XeTeXcheck.#1.#2-#3\relax{#1}
\ifnum\expandafter\XeTeXcheck\XeTeXrevision.-\relax>996 %
\def\XeTeXcheck#1{}
\else
\def\XeTeXcheck#1{%
\ifnum"#1>"FFFF %
\long\def\XeTeXcheck##1\endgroup{\endgroup}
\expandafter\XeTeXcheck
\fi
}
\fi
Best regards,
Maksim.
On Sun, 28 Aug 2016 15:12:48 +0100
Arthur Reutenauer <arthur.reutenauer at normalesup.org> wrote:
> Hi Maksim,
>
> First of all thank you for your efforts, although I would say you’re
> trying to do a little too much at this stage, I’ll explain why at the
> end.
>
> > ! Conflicting pattern ignored.
> > l.6024 }
> >
> > ?
> > ! Emergency stop.
> > l.6024 }
> >
> > ! ==> Fatal error occurred, no output PDF file produced!
> > Transcript written on luatex.log.
> >
> > Is there any way to make it more verbose? Or debug the issue somehow?
>
> You can’t really make it more verbose with LuaTeX, but debugging the
> issue is easy: conflicting patterns (called “duplicate patterns” by
> XeTeX and other engines) are patterns where the underlying character
> strings are the same, for example a1b and a2b. If you generate formats
> for XeTeX instead of LuaTeX, it gives you the exact line number where
> the offending pattern is found -- i. e., the second occurrence, which
> should help you find the first one.
>
> Using that technique I found a number of conflicts such as б1ь and
> б8ь, в1ь and в8ь, as well as а1й and а8й, а1ў and а8ў, and the more
> intriguing pairs 1’2а and ’3а, 1’2е and ’3е, etc. This makes me suspect
> that the patterns haven’t been developed with great care.
>
> > Also, please, clarify for me usage of quotes. There are 3 symbols used in hyph-be.tex: ' ` ’
> > I suspect this can confuse the engine, since generate-plain-patterns.rb checks only the first one and convert it to the third one to populate hyph-quote-<lang>.tex
> > What is the official position on quotes? Should one use only ' and *TeX will do the rest, or other symbols are allowed too?
>
> Any symbol is allowed in a hyphenation pattern for TeX as long as you
> set its \lccode correctly, which is done in a file called
> unicode-letters.def, or later within hyph-utf8. If the characters don’t
> have a correct \lccode, you get an error from TeX saying “Non-letter”,
> and since you’re not reporting anything like that, your system seems to
> be set up correctly from that point of view.
>
> However, TeX won’t treat the different types of apostrophes in any
> special way, there are no equivalence tables or anything like that. To
> the engine, the different Unicode characters for the apostrophe are
> simply that, different characters. We enforce equivalences such as the
> one between ' and ’ by duplicating every pattern containing an
> apostrophe and putting it in the hyph-quote-* files as you’ve seen, so
> in your case we could do that by putting all patterns with ` and ’ in
> hyph-quote-be.tex, and the patterns with ' in the main file. We can
> update the Ruby scripts to do that.
>
> The reason for having only one type of apostrophe in the main file
> (hyph-be.tex) is so that other programs that have a notion of
> equivalence won’t get confused; this is not about TeX (at least not
> about UTF-8 TeX, see below).
>
> > And the third moment with these patterns is T2A encoding. The U+2019 symbol (the third quote from the list above) make conversion impossible, since the symbol is not mapped in converter. I tried to enable it in t2a.dat and regenerate converter, but it fails with message: The encoding t2a uses more than two bytes to encode characters.
>
> Yes, of course, in T2A there is only one character slot for the
> apostrophe, so you shouldn’t try and map all the different characters
> one-to-one. This is precisely where the strategy explained in the
> paragraph above helps: if you extract all the different types of
> apostrophes to an auxiliary file and keep only one in the main file, you
> can work around that problem. That said, do you really need to use the
> patterns in an 8-bit encoding?
>
> In conclusion, I think you should try and test the patterns first; you
> don’t need any of the machinery that hyph-utf8 provides, but for example
> just
>
> ---- BEGIN test-hyph-be.tex
> \catcode`\{=1
> \catcode`\}=2
> \input unicode-letters.def
> \lccode`\'=`\'
> \lccode`\`=`\`
> \lccode`\’=`\’
> \input hyph-be
> % Your text here
> ---- END test-hyph-be-tex
>
> to be compiled with “xetex -ini -etex test-hyph-be.tex”. We’ll do the
> packaging later.
>
> Best,
>
> Arthur
More information about the tex-hyphen
mailing list