[tex-hyphen] Adding latvian language (hyphenation and babel)

Mojca Miklavec mojca.miklavec.lists at gmail.com
Mon Nov 3 02:07:41 CET 2008


Hello Ilmars,

On Mon, Nov 3, 2008 at 1:10 AM, Ilmars Poikans wrote:
>
> I hope Lesser General Public Licence (LGPL) 2.1 (that's license in hyph
> readme) is ok to include those patterns in Tex?

Definitely.

>> Since recently, there is a package for hyphenation patterns that is
>> called hyph-utf8, and maintained by Mojca Miklavec and me.  We can add
>> the Latvian patterns right away.
>
> I have cleaned up hyphenation file (removed patterns, which repeated, and
> commented out two patterns, about which tex complained, that they are
> dublicates, but I didn't find any repeats)

There might be similar patterns. For example a1a and a2a. I didn't
check the patterns themselves yet.

> There wasn't any problems when running xelatex with this hyphenation file,
> but non-unicode engines complained about invalid patterns. I guess \input
> conv-utf8-ec.tex in loadhyph-lv.tex isn't the right convertor.

No, it's definitely not. There are many Latvian characters missing there.

> Probably
> there aren't all latvian special characters available in any encoding in
> Tex. So, what should we do with 8bit engines and patterns?

I wanted to say "nothing", but then I remembered that we added Latin7
(=ISO-8859-13) for Lithuanian, so maybe you could use that one.
However, I would add a special note in any case. One has problems
finding latin7-encoded fonts, so it makes sense to use UTF-8 anyway.

See http://www.tug.org/svn/texhyphen/trunk/hyph-utf8/tex/generic/hyph-utf8/conversions/
until it shows up on CTAN. One thing that I don't like is the current
naming scheme (il2, il3, but l7x). I would vote for renaming l7x, I'm
just not quite sure into what (il7, latin7, ... but we want to keep
consistency).

>> The Babel maintainer is Johannes
>> Braams, but he tends to be quite unresponsive and updates Babel very
>> infrequently.  But you don't actually need to go through him in order to
>> add a new language to Babel.  All you need is to create a latvian.ldf
>> file containing, at least, the translations for the captions
>> (\prefacename, \refname, etc.  Just look up any *.ldf file for the
>> list), and any setting you need for typesetting Latvian.
>
> Won't there be problems with special Latvian chars? I have found some quite
> old latvian.ldf files in Internet, they prepend \=, \c, \v, before special
> chars. Here is link below:
>
> http://home.lanet.lv/~drikis/TeX/
>
> I don't know if you will see special latvian chars in table, but I can
> prepare list of latvian char codes in unicode and ISO8859-13 (win-1257 in
> windows) (http://en.wikipedia.org/wiki/ISO_8859-13).
>
> Already existing gloss-latvian.ldf in Xelatex polyglossia package gives
> all(?) translations. Questions remains how to encode special chars in them.

Polyglossia is meant to be used with XeLaTeX, so you don't need to
care about encoding. If you are talking about babel, I would suggest
to coordinate with Lithuanians. They have just uploaded a package to
CTAN recently. Maybe you could make a "baltic" package together where
you would include both Latvian and Lithuanian files. If you want to
use pdftex, then you both depend on more or less the same set of
files.

Both Sigitas Tolusis and Vytas Statulevicius have replied us not so
long ago and uploaded Lithuanian files to CTAN. I guess that you
should be able to reach them and try to convence them to make a
package together with them.

>> Something we need to know for the hyph-utf8 package is what font
>> encoding you use in LaTeX.  Is there some preferred encoding for Latvian
>> TeX documents?  Unfortunately we don't support multiple encodings for
>> the moment, but that's on our roadmap.
>
> I would prefer everything in Unicode :) I guess almost all documents
> nowadays are written in unicode (openoffice, ms office). There is also 8bit
> encoding ISO8859-13 (win-1257 in windows) for Baltic states.
>
> That 8bit charset should cover latvian, estonian and lithuanian languages,
> but I don't know what encoding are used in Lith and Estonian latex packages,

Lithuanian used to have a dedicated encoding that has been abandoned.
Now they are using Latin7 (ISO-8859-13).

But the question is not whether you write your documents in Latin7 or
UTF-8. The question is whether you process them with pdfTeX or XeTeX.
With pdfTeX you need to depend on some encoding. And if you declare
your patterns to be in Latin7, they will only work properly in Latin7.

> but for Latvian it is standart for quite long time. There was latvian
> specific encoding in ancient times (even before Win 95), but I guess it is
> dead for long time, because of ISO standarts coming in place.

I would not try to wake it up now.

>> Incidentally, we added Lithuanian patterns last week, so Latvian will
>> be a welcome addition :-)
>
> I can't allow Latvian be too long behind other Baltic states :)

I agree. Feel free to start with ConTeXt translations :)

Mojca


More information about the tex-hyphen mailing list