[texhax] lwarp package — Native LaTeX to HTML conversion

Brian Dunn BD at bdtechconcepts.com
Mon Mar 21 17:53:01 CET 2016

On Sunday, March 20, 2016, Deyan Ginev wrote:
> Looking at the implementation
> details, it seems that much of the inevitable customization is again
> present - in order to resolve the "impedance mismatch" between the
> printed page and the hypertext webpage, various high level constructs
> need to be mapped correctly over to the HTML

CSS3 brings extra capability in some areas.

The question of where to intercept the code is an interesting one.  It would 
be desirable to use as much of the existing code as possible in the core and 
commonly-used packages, patching only at a low level where necessary, but so 
much of what LaTeX does is irrelevant to HTML output, and so many packages 
patch each other, that the result can be a mess.  For this reason, many 
packages are "emulated" at the user-interface level.  As time went on, I found 
reason to more closely match lower-level data structures in order to re-use 
some existing code.  Floats are an example; using the existing data structure 
allows the cleveref package to work as-is, even though the high-level float 
code is emulated.  Using \CaptionSeparator from babel is another example.

> the amount of work that it took to get to the current stage

I describe it in the manual  as a large number of small technical challenges.  
The first version was LaTeX to Asciidoc, allowing Asciidoc to handle many of 
the low-level details.  Asciidoc is one of the most complete markup languages, 
but of course still had limitations in what could be done.

> the estimated difficulty for adding support for new packages.

Text-related packages may very well work as-is or with minor patches.  In 
fact, I have not yet made bibliography citations into hyperlinks, for example, 
but they do appear in the text as-is.  Anything with graphics can be embedded 
inside a "lateximage", which will appear as an SVG image in the HTML result.  
See the math environments as an example.  (Described in more detail a few 
paragraphs below.)

The real work goes into adapting necessary features, but which in LaTeX 
include a lot of underlying code which has to be re-interpreted for HTML.  
Floats are a big example, but so is something like algorithmic.  \hfill and 
friends are not easily translated, so some patching with CSS was required.

By the way, digging through all the old code makes me appreciate modern 
packages such as xparse, etoolbox, xifthen, xstring, everyhook, calc, 
arrayjobx, zref, environ, printlen, etc.  Thanks to all these authors!

> I see that there is a Tikz -> SVG support already
> operational. Is that building on the tex4ht driver for SVG directly?

The environment is embedded inside a "lateximage".  The way it works is LaTeX 
draws the image on a page by itself, and also writes external instructions to 
grab that single page, tight crop, convert to SVG, and rename.  Meanwhile, 
LaTeX places a reference to that name in the HTML output.

The same procedure is used for math, which has the unfortunate side-effect of 
littering the project with images, and there is not yet any mechanism for 
reuse.  Copy/paste yields the LaTeX expression for each math object, inline 
with the regular text.

There are some counterarguments for the ideas of using 300dpi PNG or GIF 
instead of SVG, or MathML with the use of some post-processing.

Brian Dunn
BD Tech Concepts LLC
bd at BDTechConcepts.com

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://tug.org/pipermail/texhax/attachments/20160321/85d068bb/attachment-0001.html>

More information about the texhax mailing list