[texhax] lwarp package — Native LaTeX to HTML conversion

Deyan Ginev d.ginev at jacobs-university.de
Mon Mar 21 03:28:44 CET 2016

Dear all,

Since I noticed there is a brief tex -> html discussion going on, let me
add one more perspective here. I'm a contributor to the LaTeXML
conversion project, which I am guessing is the one implied by the
reference to "Perl" in Uwe's last email.

On 20.03.2016 20:08, Uwe Lueck wrote:
>> This is a LaTeX package which causes LaTeX to directly generate HTML tags,
>> using pdftotext and a few other utilities to convert the resulting PDF file
>> into HTML files.
> The approach is interesting, yet if you convert LaTeX to PDF 
> and the result to HTML, the meaning of "direct" forbids calling 
> this a "direct" generation of HTML. It is just as "direct" as 
> the late Eitan Gurari's tex4ht.

Another point to sneak in here is now that libarries such as "pdf.js"
exist, and browsers have native PDF previews via the HTML5 canvas,
having a TeX-near mapping from PDF to HTML may have little added value.
In a way it is only worth the effort if it produces a "better" HTML5
document, where "better" is parametric in the goal the author is trying
to achieve (e.g. eBooks have different requirements than web sites, or
say even more modern - interactive exercise sheets.)

> I have never used tex4ht, but my impression is that this is the 
> most promising way to get HTML from LaTeX. So you should tell
> what lwarp offers that tex4ht doesn't.

Well, that measure of being "promising" depends on the goal you're
trying to achieve...

> Now as to "native": There are LaTeX-to-HTML converters that use 
> Perl or things like this (Pandoc). For those I could accept 
> calling them "direct" conversion, but not "native", as they use 
> external software for the conversion rather than the LaTeX 
> typesetting system.

This seems to be a fair distinction.

>  A problem with this approach is that the 
> author's custom macros cannot be processed.

Well, that's not entirely true. It's definitely true for early stage
reimplementations (such as pandoc), which are quite limited in coverage.
LaTeXML on the other hand already supports an impressive subset of TeX.
A brief illustration could be the classic xii.tex example [1].

It is certainly not complete in coverage yet, with about 61% of
arXiv.org's academic sources converting without issues [2]. But it's
also definitely not a toy project at this stage.

That being said, I find lwarp.sty to be a fascinating project, and the
"Alternatives" section in its manual is a rather on-point overview of
the current solution attempts [3]. Looking at the implementation
details, it seems that much of the inevitable customization is again
present - in order to resolve the "impedance mismatch" between the
printed page and the hypertext webpage, various high level constructs
need to be mapped correctly over to the HTML, before TeX gets to have
its way with them. (I had some more thoughts in that vein in an old blog
post of mine [4]).

I think each conversion process has done significant work in that
direction, and I find myself wishing that we could reuse and exchange
our bindings more effectively between projects.

> I should not advertise my blog package in the moreyhpe bundle
>     http://ctan.org/pkg/morehype
> (at present) but the original posting provokes the question 
> of what "native" conversion of LaTeX to HTML could be: 
> With blog.sty, the source code is actually parsed by LaTeX 
> (I consider it so perverse to parse LaTeX source code 
>  by non-TeX software), and it "directly" \writes HTML, 
> the TeX macros expand to HTML.

Right, there is a clear difference of perspective there. My view is that
LaTeX's markup should need no further extensions to generate (semantic,
high quality) HTML5, as the alternative increases the already high
learning curve, and makes a highly technical writing experience even
more involved. For me the pain of having to reeducate the entirety of
the latex authoring world to write "web-friendly" latex is a worse
approach than silently offering a solution under the hood. And that is
something I really appreciate in lwarp's current vision, as Brian seems
to be sheltering the authors as much as possible.

I think there is an interesting design problem there, and I am always
curious to see the trade-offs that each of these projects decides on.
I'm curious to learn more about the differences between lwarp and
tex4ht, the amount of work that it took to get to the current stage, and
the estimated difficulty for adding support for new packages. Looking at
lwarp's support list, I see that there is a Tikz -> SVG support already
operational. Is that building on the tex4ht driver for SVG directly?

Wishing everyone a nice week ahead,

[1] xii.tex example in the latexml showcase

[2] arXMLiv conversion status

Note: And the 61% here are missing a baseline, since it's unclear how
many of these sources run error-free with a stock pdflatex installation.
Apologies for the hanging number.

[3] lwarp 0.12 manual

[4] "LaTeX is Dead (long live LaTeX)" blog post

> Cheers,
>     Uwe.
> _______________________________________________
> TeX FAQ: http://www.tex.ac.uk/faq
> Mailing list archives: http://tug.org/pipermail/texhax/
> More links: http://tug.org/begin.html
> Automated subscription management: http://tug.org/mailman/listinfo/texhax
> Human mailing list managers: postmaster at tug.org

More information about the texhax mailing list