[tex-hyphen] Czech and Slovak hyphenation patterns

Zdenek Wagner zdenek.wagner at gmail.com
Mon Jun 23 12:38:35 CEST 2008


2008/6/23 Mojca Miklavec <mojca.miklavec.lists at gmail.com>:
> Hello,
>
> thanks a lot for the detailed explanation.
>
> On Mon, Jun 23, 2008 at 11:06 AM, Petr Olsak wrote:
>>
>> Hello Mojca,
>>
>> I looked at your files and I decided that you are solving the way of pattern
>> conversion which is depend on TeX engine used (8bit versus utf8). Only one
>> 8bit encoding is supported now (T1 encoding of EC fonts) for our languages.
>
> True. I suspected that you might be using ISO Latin 2 (though I have
> looked into it once and noticed that it's not really IL2, in
> particular it didn't cover Slovenian characters, but that's not
> important at all - we don't really need it, it's just that it's not
> 100% compatible).
>
You can find it in il2enc.def and you can also make a table from csr10
(by testfont.tex). IL2 in its upper part is just a subset of
ISO-8859-2. This encoding was designed when computers were slow and
generating of each character by MF took several seconds and in an
extreme case generation of a full font took half an hour (MS DOS,
12MHz 80286, MF from emTeX). Thus characters not neaded for Czech and
Slovak were omitted for saving time.

>> It looks right, but we have implemented a little different approach in
>> csplain/cslatex. I describe here the principles of csplain (cslatex is
>> similar, but for details you can ask Z. Wagner or P. Tesarik).
>>
>> The csplain loads Knuth's English pattern plus Czech patterns by P. Sevecek
>> in two encodings (iso-8859-2 plus T1) and Slovak patterns by J. Chlebikova
>> in two (the same) encodings. Summary: there are five patterns loaded.
>>
>> The iso-8859-2 encoding (named il2) is default in csplain, so the file
>> il2code.tex is loaded instantly in initex state. The \csaccents command is
>> defined here. By this command user can change the original behavior of \v,
>> \', etc. commands (implemented by \accent primitive and described in
>> TeXbook) to "expansion behavior", where (for example) \v c expands to single
>> character ccaron. The patterns loading (from "\v c" form to
>> il2 encoding) looks like:
>>
>> \begingroup
>>   \language=\iltwoczech
>>   \csaccents
>>   \input czhyphen.tex  % hyphen patterns by Sevecek in "\v c form"
>> \endgroup
>>
>> The Slovak patterns are read by similar way.
>> Next, the same patterns are loaded secondly in T1 (it means EC) encoding.
>> This is done by following steps:
>>
>> \begingroup
>>   \input t1code.tex    % different \csaccents is defined here
>>   \language=\toneczech
>>   \csaccents
>>   \input czhyphen.tex  % hyphen patterns by Sevecek in "\v c" form
>> \endgroup
>>
>> The similar work is done during Slovak hyphen patterns loading.
>
> Thanks a lot for the explanation.
>
>> What we will need to do if the patterns sources are stored in utf8: First,
>> the converter to il2 encoding similar to conv-utf8-ec.tex have to be done.
>
> I can provide that converter, no problem.
> What should be the source? Simply the upper part of ISO Latin 2? (We
> probably don't need any changes in the lower part? Any other special
> characters needed outside of the ISO specification?)
>
>> Second: I can try change the hyphen.lan file (which si used for pattens
>> loading in csplain) in order to load an use your files and converters. I can
>> add new encoding in csplain if the unicode-ready TeX engine is detected, so
>> there will be each pattern loaded three times: in il2 encoding (default), in
>> t1 encoding (EC fonts) and in unicode (Unicode fonts). User can choose
>> preferable encoding at begin of each document.
>
> This is all completely up to you. But it would be really nice if the
> work could be unified and if the old files could go away (in order not
> to cause even more confusion and in order not to start shipping
> incompatible patterns). So if you would be ready to do that, it would
> be most welcome.
>
> What about in LaTeX itself? Should the patterns also be loaded twice
> with two different encodings (when one simply uses
> \usepackage[slovak]{babel})? (I guess not, but I don't really know.)
>
>> The relevant files from csplain format can be found in TeXlive or in
>> ftp://math.feld.cvut.cz/pub/olsak/csplain. Note csplain.ini, hyphen.lan
>> il2code.tex t1code.tex especially.
>
> Thanks a lot, I'll look into it.
>
> 2008/6/23 Zdenek Wagner  wrote:
>> Just small comments: IL2 is not ISO-8859-2, the lower part is OT1, the
>> upper part is ISO-8859-2.
>
> xl2.enc is a bit different. Can you comment about the question above?
> Is it enough to only provide conversions for the upper part of IL2
> only? (For IL3 I have only added five letter or so :)
>
>> The default encoding in cslatex is IL2
>> (exactly as in csplain). TL2007 contains code (probably written by
>> Jonathan Kew) that allows reading the same hyphenation patterns in
>> UTF-8.
>
> In TL2008 all the xu-* files written by Jonathan will be dropped.
>
> Mojca
>



-- 
Zdeněk Wagner
http://hroch486.icpf.cas.cz/wagner/
http://icebearsoft.euweb.cz


More information about the tex-hyphen mailing list