[tex-hyphen] Mongolian & encodings

Mojca Miklavec mojca.miklavec.lists at gmail.com
Mon Mar 22 02:54:59 CET 2010


Hello Karl & others,

I took a bit more time and wrote a longer reply. There are two issues
with Mongolian patterns. The first one is that there are two sets (but
since there's unique set of rules for hyphenation, there is chance
that the two authors will be willing to agree on a single set) and the
second one is that they are in different encoding. I will concentrate
only on encodings in this mail.

A bit of background. There are two "big players" in the history of
support for Mongolian in TeX: Oliver Corff and Dorjgotov Batmunkh.
Both contributed quite a lot of material (and Mongolian support in
LaTeX is pretty complex anyway since Mongolian can be written in an
infinite number of scripts and directions).

Oliver Corff was the first one to publish any patterns. He wrote his
own "system", called MonTeX which included packages, patterns and
fonts, but also other things:
- http://ctan.org/tex-archive/language/mongolian/montex/
- http://ctan.org/tex-archive/language/mongolian/MNT/
- http://ctan.org/tex-archive/language/mongolian/mxd/
- http://ctan.org/tex-archive/language/mongolian/soyombo/

On the other hand Dorjgotov Batmunkh contributed Babel support,
translated "The not so short introduction to LaTeX", generated
patterns (and also some documents that shows where Oliver's patterns
break in the wrong way), ...

When Arthur and me started with hyph-utf8, the language.dat file in
TeX Live was using:
- Oliver's patterns in LMC encoding (mnhyph.tex) under the name "mongolian"
- Dorjgotov's paterns in T2A encoding (mnhyphn.tex) under the name
"mongolian2a" (since the name mongolian was not available any more)

See:

http://www.tug.org/svn/texlive/trunk/Master/texmf/tex/generic/hyphen/mnhyph.tex?view=log&pathrev=34

http://www.tug.org/svn/texlive/trunk/Master/texmf/tex/generic/hyphen/mnhyphn.tex?view=log&pathrev=5096

Dorjgotov released his patterns not that long before we started the
work ... They were imported to TL in October 2007, while Oliver's have
been there since the beginning (revision 34) and are usually loaded
with
    \language\number\l at mongolian
inside mls.sty.

The main problem now is that there is
    /usr/local/texlive/2009/texmf-dist/tex/latex/mongolian-babel/mongolian.ldf
but \usepackage[mongolian]{babel} would load the LMC-encoded patterns.
One could rename mongolian.ldf to mongolian2a.ldf and then use
\usepackage[mongolian]{babel}, but that would be stupid, in particular
because there's no other "conflicting babel support" that would force
one to use "mongolian2a" as a language name.

Maybe the best possible solution at this time would be to convince
Oliver Corff to allow us to rename his "mongolian" to "mongolianlmc"
and let him fix the line in his support mls.sty (no single user would
be affected by that) and rename "mongolian2a" to "mongolian", to let
babel support work properly (which would make many people happy).

On Sat, Mar 20, 2010 at 23:38, Karl Berry wrote:
>
>    - one author (of the old patterns) wants to have automatic
>    transliteration (he types in latin alphabet and wants the
>    corresponding cyrillic glyphs in the resulting document) which is
>    probably only possible with the proper font, but there's hardly any
>    font in that encoding present (LMC);
>
> I've never heard of LMC.

http://ctan.org/tex-archive/language/mongolian/montex/

> How is the author using these patterns now, ie, what font?

His own metafont font. So [I first thought that] Type 3 (bitmap) was
the only available font that can be used with these patterns. Or at
least that's what CTAN says and the montex documentation uses Type 3.
However I see that there is
    /usr/local/texlive/2009/texmf-dist/fonts/type1/public/montex/
but I don't know where those outlines come from. Anyway: I guess that
that's the only font that supports his LMC encoding.

The main point of LMC encoding is that it allows transliteration (one
writes in ASCII and gets the pdf typeset in Cyrillic without letting
TeX notice that at all) and the author wants to keep that
functionality. However, I guess that this functionality only works in
connection with his package, so if we leave the functionality there
with his package, we should not worry about anything else.

> If the
> font(s) he is using is/are not in TL, we could forget it.

It is, but the way to use it is a bit unconventional. Here are some
paragraphs from
    /usr/local/texlive/2009/texmf-dist/doc/latex/montex/montex.tex


\section{\MonTeX\ and Recent \TeX\ Trends}

As soon as the LH Cyrillic fonts support the Mongolian currency sign,
\MonTeX\ will switch to this font set. At the moment the
private encoding \LMC\ is favoured over LH; future implementations
of \MonTeX\ will provide a smooth transition for the user: documents
developed with older versions of \MonTeX\ will be upward compatible.

The \texttt{babel} package will, perhaps, also be supported in due
course; at the moment, \texttt{babel} support is lacking mainly due
to font encoding questions and a private RL setup. At present,
\MonTeX\ is \emph{not} built with \texttt{babel} compatibility in mind.
It must be seen as a stand-alone extension similar to
\texttt{german.sty} or the \textsf{CJK} package.

...

\section{Hyphenation Patterns}

\MonTeX\ provides hyphenation rules for Modern Mongolian (Xalx).
... hyphenation patterns for Russian exist at CTAN
but they are unfortunately not suited for \MonTeX\ withour prior
work.

... A format file is usually
created when a new \TeX\ or \LaTeXe\ system is installed, but creating
a new format can be done at any later time again. A special variant
of \TeX\ called \texttt{initex} is used for this purpose.
The procedure sounds more intimidating than it actually is.
Since there are many different types of \TeX\ installations, the
procedure is somewhat system-dependent. There is detailed on-line
documentation available for performing this task, either in form of
a text file for emtex, or in form of a FAQ file which can be
displayed using the command \texttt{texconfig faq} on teTeX systems.

Mojca


On Mon, Nov 23, 2009 at 17:27, Oliver Corff wrote:
> Dear Mojca,
>
> No problem. I'll do that in about two weeks from now.
>
> BTW, this is a good opportunity for me to clean up MonTeX code and adapt
> MonTeX to XeLaTeX.
>
> I've never dealt with Babel so I am a bit at a loss there.
>
> Still, despite modern encodings, I still cherish the possibility to
> write in a transliteration (pure ASCII) and have the system do to
> conversion.
>
> Best regards,
> Oliver.
>
> PS: Due to the different code spaces, merging romanized and Cyrillic
> Mongolian hyphenations into ONE file should not be a big problem.

(It is not possible since some slots overlap.)


Summary (for those who managed to read this mail until this point): in
my opinion it would be best to rename "mongolian" to "mongolianlmc" in
language.dat and "mongolian2a" to "mongolian" + ask Oliver Corff to
fix his package. That way the users of Oliver's package would not be
affected and users of Dorjgotov's babel support would benefit a lot
with the ability to use babel+patterns in an easy way (now they need
to hack manually).



More information about the tex-hyphen mailing list