[tex-hyphen] Mongolian & encodings

Oliver Corff corff at zedat.fu-berlin.de
Tue Mar 23 10:18:08 CET 2010


Hi Mojca,
Hello everybody,

Finally I come to answer your mail... and I have to explain why 
everything takes such a long time.

I must confess that I am totally absorbed by scientific work at the 
moment (a major dictionary) and, besides using the traditional script 
aspects of Mongolian and Manju, I rarely use even my own MonTeX at all.

Language and computer use in Mongolia are absolutely and dominantly 
geared towards using a Cyrillic environment. The classical script is 
only used in academical, formal and decorative settings, not in 
practical applications like books, newspapers or (internet) 
communication. Nonetheless, there is a strong need for a system being 
able to seamlessly integrate all worlds of writing --- this is the 
guiding thought which led me to write MonTeX.

When writing MonTeX, a universal Cyrillic standard was close to being 
considered a pipe dream (see the plethora of Cyrillic input encodings 
provided with MonTeX), and now that we have Unicode and functioning 
Cyrillic support on all sorts of computing platforms, the Mongolians 
/continue/ to modify ASCII>128<255 of Arial and other M$ fonts in order 
to run Win 1251 lookalike codepages even though the systems do speak 
Unicode these days. Writing software applications which use German 
umlauts (ä, ö, ü) and running these on vanilla office computers in 
Mongolia frequently makes these characters (needed for transliterations) 
unusable because somebody installed that homebrew modified font which 
will clutter up everything in the wrong place. And as long as people 
know these fonts are out there they continue to write Word documents and 
create Web pages which will require exactly these crippled fonts... It 
sucks. In order to avoid these totally mindless problems I continue to 
work pure ASCII and let TeX do the generation of nice Cyrillic and 
Mongolian traditional script. Besides, there are linguistic advantages 
of being able to sort modern Xalx Mongolian and classical Mongolian in 
traditional writing according to the same input alphabet... So MonTeX 
continues to have a strong raison-d'être in a albeit small community.

I do feel strongly that the limited Metafont set supplied with MonTeX is 
not up to the standards of what we enjoy today. I used to stay with 
Metafont because I like the elegance of the creative process, the 
language is pleasing, the concept is pleasing. Alas, the world has gone 
in a different direction. The type1 fonts contributed to MonTeX were 
made by a kind fellow and author of a popular LaTeX book quite a few 
years ago.

Given the mainstream popularity of Babel it is absolutely conceivable 
why we should support a Babel-style environment which can access modern 
fonts --- yet in my recent work (a dictionary in five languages and more 
scripts: Manju, Tibetan, Mongolian, Uighur (in Arab script) and Chinese) 
I started using XeLaTeX, even abandoning my on ctib Tibetan system, due 
to the completeness of the now available Tibetan font. Then I abandoned 
ArabTeX because some Arab OT fonts have more glyphs (and I need some 
historical material for which I never figured out how ArabTeX would do 
exactly what I needed): hence I switched to XeLaTeX. So my next thought 
is not how to port the complete MonTeX functionality to Babel, but to 
XeLaTeX; in the field where I work XeLaTeX font support is superior to 
everything we've seen so far, and is indispensable. So I gave up 
understanding the workings of a Babel system (and was not even aware of 
the Babel support for Mongolian prior to your first mail; the Mongolian 
community is small but highly fragmented---I developed my system 
basically at the Academy of Sciences of Mongolia about 10 years ago, and 
younger talents coming from different institutions have entirely studied 
abroad; sometimes it is difficult to keep track of each other).

To make a long story short:

My suggestion is to keep the ASCII Input -> Cyrillic and Traditional 
Script Output functionality of MonTeX alive in a new environment, 
preferably based on XeLaTeX.

XeLaTeX is capable of accepting ASCII input and producing Script output 
(e.g. consult the ArabXeTeX package!)

The hyphenation support and language settings will not be taken care of 
by Babel, but by polyglossia.

I have no idea yet how such a hyphenation file should look like in order 
to accommodate both ASCII and Cyrillic input methods.

The flawed hyphenation patterns I generated more than a decade ago are 
mainly due to an incomplete dictionary me and my colleague based our 
work on.

Please, please, please understand that I am overwhelmed by the monstrous 
dictionary work mentioned above I've been working on for more than a 
decade and supposedly going to print this summer (and MonTeX was made 
for this purpose very much like TeX was made for typesetting Math, if I 
am allowed to make this comparison); until summer of this year, I have 
no chance to do any meaningful work of porting MonTeX to contemporary 
Cyrillic fonts and make it work in XeLaTeX (currently I am using MonTeX 
in XeLaTeX without relying on the beautiful font support of XeTeX: a 
shame!) Currently my only focus is on finishing the dictionary.

There is a lot of modifications I tweaked into MonTeX in order to 
created complicated Manju transliterations of Tibetan text, but so far I 
was not able to consolidate everything into a new MonTeX version. Can we 
postpone the renaming issue? It is not the only issue I want to fix.

Please accept my apologies for this lengthy mail.

Best regards,
Oliver.



On 22.03.2010 02:54, Mojca Miklavec wrote:
> Hello Karl&  others,
>
> I took a bit more time and wrote a longer reply. There are two issues
> with Mongolian patterns. The first one is that there are two sets (but
> since there's unique set of rules for hyphenation, there is chance
> that the two authors will be willing to agree on a single set) and the
> second one is that they are in different encoding. I will concentrate
> only on encodings in this mail.
>
> A bit of background. There are two "big players" in the history of
> support for Mongolian in TeX: Oliver Corff and Dorjgotov Batmunkh.
> Both contributed quite a lot of material (and Mongolian support in
> LaTeX is pretty complex anyway since Mongolian can be written in an
> infinite number of scripts and directions).
>
> Oliver Corff was the first one to publish any patterns. He wrote his
> own "system", called MonTeX which included packages, patterns and
> fonts, but also other things:
> - http://ctan.org/tex-archive/language/mongolian/montex/
> - http://ctan.org/tex-archive/language/mongolian/MNT/
> - http://ctan.org/tex-archive/language/mongolian/mxd/
> - http://ctan.org/tex-archive/language/mongolian/soyombo/
>
> On the other hand Dorjgotov Batmunkh contributed Babel support,
> translated "The not so short introduction to LaTeX", generated
> patterns (and also some documents that shows where Oliver's patterns
> break in the wrong way), ...
>
> When Arthur and me started with hyph-utf8, the language.dat file in
> TeX Live was using:
> - Oliver's patterns in LMC encoding (mnhyph.tex) under the name "mongolian"
> - Dorjgotov's paterns in T2A encoding (mnhyphn.tex) under the name
> "mongolian2a" (since the name mongolian was not available any more)
>
> See:
>
> http://www.tug.org/svn/texlive/trunk/Master/texmf/tex/generic/hyphen/mnhyph.tex?view=log&pathrev=34
>
> http://www.tug.org/svn/texlive/trunk/Master/texmf/tex/generic/hyphen/mnhyphn.tex?view=log&pathrev=5096
>
> Dorjgotov released his patterns not that long before we started the
> work ... They were imported to TL in October 2007, while Oliver's have
> been there since the beginning (revision 34) and are usually loaded
> with
>      \language\number\l at mongolian
> inside mls.sty.
>
> The main problem now is that there is
>      /usr/local/texlive/2009/texmf-dist/tex/latex/mongolian-babel/mongolian.ldf
> but \usepackage[mongolian]{babel} would load the LMC-encoded patterns.
> One could rename mongolian.ldf to mongolian2a.ldf and then use
> \usepackage[mongolian]{babel}, but that would be stupid, in particular
> because there's no other "conflicting babel support" that would force
> one to use "mongolian2a" as a language name.
>
> Maybe the best possible solution at this time would be to convince
> Oliver Corff to allow us to rename his "mongolian" to "mongolianlmc"
> and let him fix the line in his support mls.sty (no single user would
> be affected by that) and rename "mongolian2a" to "mongolian", to let
> babel support work properly (which would make many people happy).
>
> On Sat, Mar 20, 2010 at 23:38, Karl Berry wrote:
>>
>>     - one author (of the old patterns) wants to have automatic
>>     transliteration (he types in latin alphabet and wants the
>>     corresponding cyrillic glyphs in the resulting document) which is
>>     probably only possible with the proper font, but there's hardly any
>>     font in that encoding present (LMC);
>>
>> I've never heard of LMC.
>
> http://ctan.org/tex-archive/language/mongolian/montex/
>
>> How is the author using these patterns now, ie, what font?
>
> His own metafont font. So [I first thought that] Type 3 (bitmap) was
> the only available font that can be used with these patterns. Or at
> least that's what CTAN says and the montex documentation uses Type 3.
> However I see that there is
>      /usr/local/texlive/2009/texmf-dist/fonts/type1/public/montex/
> but I don't know where those outlines come from. Anyway: I guess that
> that's the only font that supports his LMC encoding.
>
> The main point of LMC encoding is that it allows transliteration (one
> writes in ASCII and gets the pdf typeset in Cyrillic without letting
> TeX notice that at all) and the author wants to keep that
> functionality. However, I guess that this functionality only works in
> connection with his package, so if we leave the functionality there
> with his package, we should not worry about anything else.
>
>>   If the
>> font(s) he is using is/are not in TL, we could forget it.
>
> It is, but the way to use it is a bit unconventional. Here are some
> paragraphs from
>      /usr/local/texlive/2009/texmf-dist/doc/latex/montex/montex.tex
>
>
> \section{\MonTeX\ and Recent \TeX\ Trends}
>
> As soon as the LH Cyrillic fonts support the Mongolian currency sign,
> \MonTeX\ will switch to this font set. At the moment the
> private encoding \LMC\ is favoured over LH; future implementations
> of \MonTeX\ will provide a smooth transition for the user: documents
> developed with older versions of \MonTeX\ will be upward compatible.
>
> The \texttt{babel} package will, perhaps, also be supported in due
> course; at the moment, \texttt{babel} support is lacking mainly due
> to font encoding questions and a private RL setup. At present,
> \MonTeX\ is \emph{not} built with \texttt{babel} compatibility in mind.
> It must be seen as a stand-alone extension similar to
> \texttt{german.sty} or the \textsf{CJK} package.
>
> ...
>
> \section{Hyphenation Patterns}
>
> \MonTeX\ provides hyphenation rules for Modern Mongolian (Xalx).
> ... hyphenation patterns for Russian exist at CTAN
> but they are unfortunately not suited for \MonTeX\ withour prior
> work.
>
> ... A format file is usually
> created when a new \TeX\ or \LaTeXe\ system is installed, but creating
> a new format can be done at any later time again. A special variant
> of \TeX\ called \texttt{initex} is used for this purpose.
> The procedure sounds more intimidating than it actually is.
> Since there are many different types of \TeX\ installations, the
> procedure is somewhat system-dependent. There is detailed on-line
> documentation available for performing this task, either in form of
> a text file for emtex, or in form of a FAQ file which can be
> displayed using the command \texttt{texconfig faq} on teTeX systems.
>
> Mojca
>
>
> On Mon, Nov 23, 2009 at 17:27, Oliver Corff wrote:
>> Dear Mojca,
>>
>> No problem. I'll do that in about two weeks from now.
>>
>> BTW, this is a good opportunity for me to clean up MonTeX code and adapt
>> MonTeX to XeLaTeX.
>>
>> I've never dealt with Babel so I am a bit at a loss there.
>>
>> Still, despite modern encodings, I still cherish the possibility to
>> write in a transliteration (pure ASCII) and have the system do to
>> conversion.
>>
>> Best regards,
>> Oliver.
>>
>> PS: Due to the different code spaces, merging romanized and Cyrillic
>> Mongolian hyphenations into ONE file should not be a big problem.
>
> (It is not possible since some slots overlap.)
>
>
> Summary (for those who managed to read this mail until this point): in
> my opinion it would be best to rename "mongolian" to "mongolianlmc" in
> language.dat and "mongolian2a" to "mongolian" + ask Oliver Corff to
> fix his package. That way the users of Oliver's package would not be
> affected and users of Dorjgotov's babel support would benefit a lot
> with the ability to use babel+patterns in an easy way (now they need
> to hack manually).
>






More information about the tex-hyphen mailing list