[tex-hyphen] Adding latvian language (hyphenation and babel)

Arthur Reutenauer arthur.reutenauer at normalesup.org
Mon Nov 3 04:08:36 CET 2008


	Hello again,

> I have cleaned up hyphenation file (removed patterns, which repeated, and 
> commented out two patterns, about which tex complained, that they are 
> dublicates, but I didn't find any repeats)

  As Mojca said, "duplicate" here refers to the underlying string, so
"a2b" would be a duplicate of "a1b"; and although I don't like the idea
of TeX rejecting those patterns, one has to admit that the simultaneous
presence of those two patterns in a plain file is a misfeature.  You
should simply delete the pattern with the smaller number.

>                                                            I guess \input 
> conv-utf8-ec.tex in loadhyph-lv.tex isn't the right convertor.

  No, it isn't.  The ec encoding (a.k.a. T1 for LaTeX) has slots for the
characters used in a great number of European languages in the Latin
alphabet, but not the Baltic Rim languages.

>                                                                Probably 
> there aren't all latvian special characters available in any encoding in 
> Tex. So, what should we do with 8bit engines and patterns?

  Actually, there is, in the Lithuanian package that has just been
uploaded to CTAN (L7x, modified from latin-7).  I would suggest taking
this one, after checking it's appropriate for Latvian, of course.

>> (\prefacename, \refname, etc.  Just look up any *.ldf file for the
>> list), and any setting you need for typesetting Latvian.

> Won't there be problems with special Latvian chars?

  No.  LaTeX language support is designed to handle that, by using its
own way of representing characters with diacritics, called LICR (for
"LaTeX Internal Character Representation", or something like that).
Basically, if you select an appropriate font encoding with fontenc, then
writing something like "\c k" will be interpreted as a single character,
*not* a TeX box with a base character and a diacritic mark.  Thus, it
can be used in patterns; but it obviously means that you have to use
that encoding in the patterns!  This is what we emulate in the hyph-utf8
package with the conv-utf8-*.tex files.

> I don't know if you will see special latvian chars in table, but I can 
> prepare list of latvian char codes in unicode and ISO8859-13 (win-1257 in 
> windows) (http://en.wikipedia.org/wiki/ISO_8859-13).

  The Estonians have already done that for you :-)

  http://www.eki.ee/letter/chardata.cgi?lang=lv+Latvian&script=latin

> Already existing gloss-latvian.ldf in Xelatex polyglossia package gives 
> all(?) translations. Questions remains how to encode special chars in them.

  LICR should be fine.

> I would prefer everything in Unicode :) I guess almost all documents 
> nowadays are written in unicode (openoffice, ms office). There is also 8bit 
> encoding ISO8859-13 (win-1257 in windows) for Baltic states.

  Input encoding is really different from font encoding, here, because
LaTeX and ConTeXt know how to convert from one to the other.  You don't
need to stick to some specific input encoding, but in order for your
document to be processed, you need to use a font encoding that contains
all the character in your text; otherwise, well, the characters just do
not display in the output file.

  That's what we mean by encoding here: we need to define some encoding
that covers all the Latvian characters, but it seems that L7x already
does.  Apart from that, you're free to choose any input encoding in your
source file; font encoding is only an internal problem inside hyph-utf8.

  Note that the above is really only relevant for 8-bit TeX engines.
You don't need any font encoding for XeTeX and LuaTeX.  Thus, you could
disregard the issue of 8-bit engines completely, and expect users to use
UTF-8 engines exclusively for Latvian documents, but I can't find this
satisfying, because we seem to have a solution here.

>> Incidentally, we added Lithuanian patterns last week, so Latvian will
>> be a welcome addition :-)
>
> I can't allow Latvian be too long behind other Baltic states :)

  Sure :-)  Estonian has been in Babel for ages!

	Arthur


More information about the tex-hyphen mailing list