[texhax] latexml discussion

Deyan Ginev d.ginev at jacobs-university.de
Tue Mar 22 15:47:46 CET 2016

Dear all,

Let me clarify some high level bits here, but if you really want to know
more about latexml I suggest reading the manual [1] or joining its
mailing list [2], to avoid cluttering [texhax] with side projects.

On 03/22/2016 06:00 AM, Philip Taylor wrote:
> David Carlisle wrote:
>> I can only assume you've never looked at xii.tex, the example 
>> mentioned earlier in the thread, (which is typical plain tex markup,
>>  not "basic LaTeX markup" :-)
> Er, no. Should I have ?!

The way I was brought up into the world of TeX, "xii.tex" had legendary
status in my university :-) It's a privilege to have the original author
(David) participate in the email thread.

> OK, so if "xii.tex" is anything to go by, this Perl script is capable of
> handling at least some of TeX's more arcane features.

First, and this is important, please do not refer to latexml as a "Perl
script". Scripts are one-off programs, typically between 100-1000 lines
of code, that offer a quick solution to a small task. There are many
actual perl scripts trying to parse LaTeX of course, but latexml goes
beyond that.

With over 20,000 lines of code, and following best practices for code
organization and quality (via perltidy and perlcritic), the core of
LaTeXML is a well-documented library for interpreting and manipulating
TeX sources.

In response to the suggestion to reimplement it in Lisp - I wouldn't
take a reimplementation effort lightly, as you could spend the better
part of a year working on that (if not more). That said, moving to a
lower level language is something that is currently under active
discussion internally, as performance is a self-confessed weakness at
the moment.

A different sign of maturity is that latexml currently has a plugin
ecosystem [3] (still growing, and the interfaces are still stabilizing),
which offers various extensions - e.g. an output capability for docx and
odt files currently resides in an extension, and so do several
tex-to-html web services, including the one behind the showcase that I
linked to in my last reply. Mature plugins (e.g. complete bindings for
LaTeX packages) tend to be merged into the core project, which is
usually a pleasant workflow.

> What would be
> really useful (IMHO) is if the author could list (a) the more arcane
> features that it is known to support (e.g., catcode changes, lc/uccode
> changes, \uppercase/lowercase, ^^ notation, ^^^^ notation, etc) and (b)
> those features that it is known not yet to support, if any (if he/she
> has not already done so). Also the extent to which it supports some or
> all of the extensions and enhancements offered by {e-TeX, PdfTeX, XeTeX,
> ...).

If we had an exhaustive list of all features offered by the various
engines, then we could go through it and tell you what latexml could
handle (where applicable). Getting that list may be the hardest ordeal
here. That also sounds like a check-list worth adding to the manual.
Btw, I think the examples you enumerated are all supported.

Ever since latexml managed to successfully interpret the raw TeX of
tikz.sty and pgf.sty, I think our perspective in the team has changed
from "we try our best, but we're nowhere near complete" to "any specific
failure to interpret TeX is an unknown bug, and we would like to hear
about it and patch it". The official goal for the 1.0 milestone is to
reach full parity with TeX, so at least we're being ambitious.

We're at version 0.8.1 right now (with 0.8.2 coming out soon).

> Kaveh wrote :
>> Phil, I wouldn't give it much chance of parsing your complex macros. 
>> From what I remember even TeX struggles with those. ;-)
> Oh, it usually manages (on a good day. The hard part (so my Polish
> friends tell me) is translating the five levels of subordinate clauses
> that I typically use in my descriptions of how they work :-)

Feel free to try latexml on (ever larger) minimal examples of your
macros, and if you see it gasping for breath - please let us know! It
has become ever harder to find isolated flaws of the TeX interpretation,
so any help with that would be very valuable.


> ** Phil.
> _______________________________________________
> TeX FAQ: http://www.tex.ac.uk/faq
> Mailing list archives: http://tug.org/pipermail/texhax/
> More links: http://tug.org/begin.html
> Automated subscription management: http://tug.org/mailman/listinfo/texhax
> Human mailing list managers: postmaster at tug.org

[1] LaTeXML manual in HTML

[2] LaTeXML mailing list

[3] LaTeXML plugins on GitHub

More information about the texhax mailing list