[tex-hyphen] tamil and malayalam

Mojca Miklavec mojca.miklavec.lists at gmail.com
Sat Mar 20 07:39:20 CET 2010


On Fri, Mar 19, 2010 at 22:13, Karl Berry wrote:
>    Maybe TL 2010 would offer better timing since it would
>    give us a bit more time to test.
>
> I hope to be getting TL 2010 out earlier than previous years

Thanks for the information.

> (I always hope this).

:)

>    One question: if language.dat says
>        somelanguage:T1 loadhyph-xx.tex
>    is it possible to access language name (somelanguage:T1) inside
>    loadhyph-xx.tex somehow?
>
> I don't know.  Maybe.  We'd have to look into precisely how language.dat
> and language.def are parsed (by babel and etex.src respectively, I
> think).

I'll try to take a look into it.

>    - we have added 12 new patterns to the package (new version will be
>
> How big are they?  Are we talking hundreds of new patterns, thousands,
> ..?  (Not sure how close we are to filling up pattern space.)

The 12 new patterns are not particularly big: one file is several
thousand lines and the eleven others in the order of hundred.

But if we are talking about the number of patterns that would have to
be increased if we started supporting different encodings
automatically (ConTeXt does that for example), it could easily
duplicate the number of patterns. Or: Russian has 7 different patterns
with 5 different encodings => that could easily result in 35 loaded
patterns (but we won't go that way). I'm not sure if we should add
that functionality to load the same patterns multiple times for
different encodings or not. Mongolian is one of the cases where we
probably don't have much choice:

- one author (of the old patterns) wants to have automatic
transliteration (he types in latin alphabet and wants the
corresponding cyrillic glyphs in the resulting document) which is
probably only possible with the proper font, but there's hardly any
font in that encoding present (LMC); I guess that one could use
mapping in XeTeX, but in pdfTeX I don't know any elegant solution for
that

- the other author (and many other users, I guess) would want to use
T2A which is widely available in fonts

Maybe I just care too much about efficiency. ConTeXt used to run for
half a minute for a "hello world" text (long long ago) and I remember
statements like: "people complained that ConTeXt was slow, but when
LaTeX decided to enable all the patterns, it became noticably slower
as well". I also remember that Norbert had to increase memory limits
several times during the initial development of hyph-utf8 (in
particular when we decided to put all those patterns to plain as well;
with all the synonyms loading a copy of patterns).

Mojca

PS: I took a look into OpenOffice patterns again. Astonishingly many
have been further developed, so we have different patterns in many
more languages than I have imagened earlier.



More information about the tex-hyphen mailing list