[Tugindia] LaTeX to XML conversions based on DTD

Radhakrishnan CV cvr at river-valley.org
Thu Sep 15 08:59:29 CEST 2011

On Wed, Sep 14, 2011 at 7:42 PM, Suresh Avvarulakshmi, Integra-PDY, IN
<suresh.avvarulakshmi at integra.co.in> wrote:
> Please let us know whether we can convert XML, based on different
> DTD's from LaTeX source using LaTeXml.
> For example,
> 1.       LaTeX to NLM DTD
> 2.       LaTeX to TFJA DTD
> 3.       LaTeX to A++ DTD

I have no experience with LaTeXML, the perl based converter for LaTeX to
XML format. Since nobody has provided any comments, I thought, I might
provide a general opinion based on my experiences.

Since LaTeX has a well developed macro language and often authors resort
to clever programming while authoring, it is a daunting task to convert
a LaTeX document into other markup formats, particularly, XML which too
has many more recipes. The plethora of DTD's increases the harshness of
the challenges.

There are different approaches to the problem:

1. parse the LaTeX document, emit XML. eg., LaTeXML, Tralics, etc.

2. create customized dvi by injecting appropriate elements, attributes,
   etc as \special's, extract with a post-processor. eg., Hermes, Lxir,
   TeX4ht, etc.

3. create variant markup to match LaTeX syntax, post-process with XSLT
   to create XML or LaTeX. eg., tbook, TeXML, etc.

Each one of the above approach has its own advantages and difficulties.
The first one is what developers will resort to. As you can imagine, it
will end up in writing a custom TeX implementation (that can match
nowhere near what Knuth has done with his infinite wisdom) to digest
various LaTeX functions and libraries, still presenting surprises on a
daily basis whenever you happen to process a document by a clever

Second approach makes use of TeX engine, meaning, you are relieved of
the burden or processing horrendous author macros and functions provided
by LaTeX packages. You need to write only a basic customization layer in
LaTeX macro language depending on your XML DTD which is fairly easy.

Third method is for authors who want to make use of LaTeX for PDF output
and XML/MathML output for online delivery from the same sources.  This
is not fit for text processing houses.

Based on my experience, I would go for the second option in which my
preference is for TeX4ht which provides extraordinary opportunities,
hooks and tricks to process any complex LaTeX document and generate
any kind of custom XML/MathML from it.

Hope this comment can help you, although, I am sorry, I could not provide
a first hand experience of LaTeXML.

Best regards

More information about the tugindia mailing list