[tex-hyphen] Special languages

Mojca Miklavec mojca.miklavec.lists at gmail.com
Fri Apr 30 13:22:08 CEST 2010

On Thu, Apr 29, 2010 at 07:30, Manuel Pégourié-Gonnard wrote:
> Le 29/04/2010 06:15, Manuel Pégourié-Gonnard a écrit :
>> With English, it would make 5 "real" languages included in the format.
>> Currently, there are 65 of them (69, but excluding the 4 languages with
>> (almost) no patterns here too, to be fair). So, we can cut the number of
>> languages rehashed at format loading by 1/13 *and* keep 100% perfect
>> backwards compatibility with almost no effort (it would be very easy to
>> implement).
> A few numbers, based on running "time lualatex '\stop'" with various
> versions of lualatex.fmt (LuaTeX 0.60.1):
> - With all languages in the format (current status in TL): ~ 2.4 seconds
> - With only english: ~ 0.19 seconds
> - With english,  dumylang, nohyphenation, german-x-2009-06-19,
> ngerman-x-2009-06-19, ibycus, arabic, farsi, mongolianlmc: ~ 0.39 seconds.

To answer your question about mongolianlmc: I figured out that I have
    language_codes['mn-cyrl']       = 'mn'
    language_codes['mn-cyrl-x-lmc'] = nil
in generate-plain-patterns.rb. There's a long story to be told about
that, but some short facts:
- Mongolian has only "a single set of rules to hyphenate", so in
theory one would not need two files
- one needs two different encodings in 8-bit TeX, so 8-bit TeX does
need to have two languages (one for each encoding; though they could
come from the same set of patterns); lmc only makes sense for MonTeX
and should not be needed in LuaTeX
- as it happens, the patterns are different since two authors have
created them independently; the reason why you would want to load
"mongolianlmc" is only if you care about which of two patterns
variants exactly you want to load, but that's not even support in
- the author of "lmc" is willing to do some work to port MonTeX to
XeTeX (maybe LuaTeX) and he'll rethink many things anyway
- there's a desire to unify the two versions of patterns later this year

Considering those facts, I do not think that it's worth to care about
loading mongolianlmc into LuaTeX at this moment. We may consider it
later when its author comes back from a time demanding job, but at
that time we'll have a single version of patterns left anyway

> PS: loading time essentially unchanged (non-measurable) by removing
> dumylang, nohyphenation, arabic and farsi from the list. So, really no need
> to worry about them.

At the end you'll probably want to somehow access data about languages
in "lua" form as well, so in the long run it might make sense to have
something like
    "arabic" = { ... hyphenate=false, ...}
anyway, but that's up to you.

One of the main reasons why you wanted to have lua-based loding of
patterns were all the catcodes. With arabic you'll hardly have any
problems :) :) :)

> PPS: the implementation could go as follows: when the format is build, for
> each language (in languages.dat), look in language.dat.lua if we are able to
> load it at runtime. If so (and the language is not english), don't load it
> in the format. The advantage of such a solution is that, is for any reason
> language.dat.lua is lagging behind language.dat, the additional languages
> are still working (only increasing the startup time).

As far as languages.dat is concerned:
1.) I still claim that you don't want to support ibycus

2.) If the language will not be defined in language.lua.dat, this
means that the language will not be present in hyph-utf8, so that
gives you no warranty that format will compile at all. When could that
a) German timestamped patterns (for which it might make sense to get
better support in LuaTeX anyway) - hopefully without compatibility
b) If some other author creates a non-utf8 compatible patterns and
convinces Karl or Norbert to create a tlpsrc file with AddHyphen entry
c) If author of some other language decides to do the same as Germans

3.) Keep in mind that MikTeX might have its own way of loading
patterns (I have no idea how it works), but it doesn't support LuaTeX
yet anyway, so you may worry about details later.


One more request: I would be really glad if we didn't have to have
en-us-x-knuth in our repository.
1.) The patterns are the same, the ushyphmax only contains a few more
2.) You may load it at format-generation time from hyphen.tex if you
want to load exactly those patterns.

Of course we may leave the proper entry in lua table.

More information about the tex-hyphen mailing list