[tex-hyphen] Czech and Slovak hyphenation patterns

Mojca Miklavec mojca.miklavec.lists at gmail.com
Mon Jun 23 12:27:47 CEST 2008


Hello,

thanks a lot for the detailed explanation.

On Mon, Jun 23, 2008 at 11:06 AM, Petr Olsak wrote:
>
> Hello Mojca,
>
> I looked at your files and I decided that you are solving the way of pattern
> conversion which is depend on TeX engine used (8bit versus utf8). Only one
> 8bit encoding is supported now (T1 encoding of EC fonts) for our languages.

True. I suspected that you might be using ISO Latin 2 (though I have
looked into it once and noticed that it's not really IL2, in
particular it didn't cover Slovenian characters, but that's not
important at all - we don't really need it, it's just that it's not
100% compatible).

> It looks right, but we have implemented a little different approach in
> csplain/cslatex. I describe here the principles of csplain (cslatex is
> similar, but for details you can ask Z. Wagner or P. Tesarik).
>
> The csplain loads Knuth's English pattern plus Czech patterns by P. Sevecek
> in two encodings (iso-8859-2 plus T1) and Slovak patterns by J. Chlebikova
> in two (the same) encodings. Summary: there are five patterns loaded.
>
> The iso-8859-2 encoding (named il2) is default in csplain, so the file
> il2code.tex is loaded instantly in initex state. The \csaccents command is
> defined here. By this command user can change the original behavior of \v,
> \', etc. commands (implemented by \accent primitive and described in
> TeXbook) to "expansion behavior", where (for example) \v c expands to single
> character ccaron. The patterns loading (from "\v c" form to
> il2 encoding) looks like:
>
> \begingroup
>   \language=\iltwoczech
>   \csaccents
>   \input czhyphen.tex  % hyphen patterns by Sevecek in "\v c form"
> \endgroup
>
> The Slovak patterns are read by similar way.
> Next, the same patterns are loaded secondly in T1 (it means EC) encoding.
> This is done by following steps:
>
> \begingroup
>   \input t1code.tex    % different \csaccents is defined here
>   \language=\toneczech
>   \csaccents
>   \input czhyphen.tex  % hyphen patterns by Sevecek in "\v c" form
> \endgroup
>
> The similar work is done during Slovak hyphen patterns loading.

Thanks a lot for the explanation.

> What we will need to do if the patterns sources are stored in utf8: First,
> the converter to il2 encoding similar to conv-utf8-ec.tex have to be done.

I can provide that converter, no problem.
What should be the source? Simply the upper part of ISO Latin 2? (We
probably don't need any changes in the lower part? Any other special
characters needed outside of the ISO specification?)

> Second: I can try change the hyphen.lan file (which si used for pattens
> loading in csplain) in order to load an use your files and converters. I can
> add new encoding in csplain if the unicode-ready TeX engine is detected, so
> there will be each pattern loaded three times: in il2 encoding (default), in
> t1 encoding (EC fonts) and in unicode (Unicode fonts). User can choose
> preferable encoding at begin of each document.

This is all completely up to you. But it would be really nice if the
work could be unified and if the old files could go away (in order not
to cause even more confusion and in order not to start shipping
incompatible patterns). So if you would be ready to do that, it would
be most welcome.

What about in LaTeX itself? Should the patterns also be loaded twice
with two different encodings (when one simply uses
\usepackage[slovak]{babel})? (I guess not, but I don't really know.)

> The relevant files from csplain format can be found in TeXlive or in
> ftp://math.feld.cvut.cz/pub/olsak/csplain. Note csplain.ini, hyphen.lan
> il2code.tex t1code.tex especially.

Thanks a lot, I'll look into it.

2008/6/23 Zdenek Wagner  wrote:
> Just small comments: IL2 is not ISO-8859-2, the lower part is OT1, the
> upper part is ISO-8859-2.

xl2.enc is a bit different. Can you comment about the question above?
Is it enough to only provide conversions for the upper part of IL2
only? (For IL3 I have only added five letter or so :)

> The default encoding in cslatex is IL2
> (exactly as in csplain). TL2007 contains code (probably written by
> Jonathan Kew) that allows reading the same hyphenation patterns in
> UTF-8.

In TL2008 all the xu-* files written by Jonathan will be dropped.

Mojca


More information about the tex-hyphen mailing list