[tex4ht] what is the fastest way to convert large document to HTML?

Nasser M. Abbasi nma at 12000.org
Fri Aug 17 06:15:47 CEST 2018


On 8/16/2018 7:29 AM, Michal Hoftich wrote:
> Hi Nasser,
> 
>> I am still struggling with very slow latex to HTML conversion
>> for large document with lots of math.
>>
>> using svg for math, the bottleneck is always dvisvgm.
>>
>> For a document of 4,000 pages, with about 50,000 math expressions, it
>> takes 8 hrs on Linux in a Vbox and 2.5 hrs on PC running Linux. It
>> takes less than 2 minutes to compile it using lualatex to PDF.
>>
> 
> It is just too long, it shouldn't take more than few minutes. Maybe
> you can try to play with Dvisvgm options?
> 
> 
>> So I want to get away from dvisvgm. Using png for math is no
>> better. Still just as slow.
>>
>> What other options are there using tex4ht to speed the
>> process of conversion that I could try? Even if the quality of
>> math is not as good as svg.
>>
>> I am only familiar with using svg and png for math images. But
>> I know tex4ht supports other formats and also mathjax.  I now
>> use make4ht to compile.
> 
> The fastest way  for compilation is MathJax. Either MathML + MathJax,
> or LaTeX math + MathJax. I prefer the first option. For second, see
> mathjax-latex-4ht.sty from helpers4ht [1], sample usage [2].
> 
> 
> It is also possible to precompile MathML to HTML + CSS using
> MathJax-node-page [3]. See mathjaxnode make4ht extension in the
> make4ht doc. But from my experience, this may take also quite some
> time, maybe more than Dvisvgm. But it produces best output, without
> need to any JavaScript on the client side.
> 
> Best regards,
> Michal
> 
> [1] https://github.com/michal-h21/helpers4ht
> [2] https://tex.stackexchange.com/a/436928/2891
> [3] https://github.com/pkra/mathjax-node-page/
> 

Thanks Michal;

I spend last few hrs trying mathjax again.  Found 2 problems.

The first one I can sort of live with, since it compiles fast now.
It is the bad math compared to svg (spacing of letters, integral
sign look bad, etc...) and also on long pages it takes long time
to render/load while mathjax is working, but I can split those
long pages if needed.

The second problem is critical. htlatex will not compile some math
when using the .cfg which tells it to use mathjax and mathml.  But it
will compile it OK when not using the .cfg

This example has math generated by Maple, which uses old \cases. But
there is a workaround this and with the workaround, it compiles OK now
with lulatex and htlatex, but not when using mathml/mathjax.

Here is one such example.

------- foo.tex-----
\documentclass[11pt]{article}
\usepackage{amsmath,mathtools,amssymb}

%fix for Maple bad latex generated
%see https://tex.stackexchange.com/questions/191479/how-to-automatically-convert-cases-to-begincases-endcases

\let\amscases\cases
\makeatletter
\def\cases{\@ifnextchar\bgroup\plaincases\amscases}
\def\plaincases#1{\begin{cases*}#1\end{cases*}}
\makeatother

\begin{document}

\begin{align*}
{\frac {1}{\sqrt { \left| y \right| }}}\mathop{\mathrm{d}y}&=
         \mathop{\mathrm{d}x}\\
  \int {\frac {1}{\sqrt { \left| y \right| }}}
      \mathop{\mathrm{d}y}&= \int \mathop{\mathrm{d}x}\\
\cases{-2\,\sqrt {-y}&$y\leq 0$\cr 2\,\sqrt {y}&$0<y$\cr}&=x+C_{{1}}\\
\end{align*}

\end{document}
============ end foo.tex =====

Now the above file compiles OK with

lulatex foo.tex

and with

htlatex foo.tex "htm,charset=utf-8" " -cunihtf -utf8"

But NOT with

htlatex foo.tex "nma.cfg,htm,3,charset=utf-8" " -cunihtf -utf8"

It gives error

-------------------------
(/usr/local/texlive/2018/texmf-dist/tex/latex/graphics-cfg/graphics.cfg)
(/usr/local/texlive/2018/texmf-dist/tex/latex/graphics-def/dvips.def))))
l.19 --- TeX4ht warning --- \halign translated into linear text ---
! Missing # inserted in alignment preamble.
<to be read again>
                    &
l.19 \end{align*}

?
-------------------

The nma.cfg file is

--------- nma.cfg ----------------------
\Preamble{mathml}
\Configure{VERSION}{}
   \Configure{DOCTYPE}{\HCode{<!DOCTYPE html>\Hnewline}}
   \Configure{HTML}{\HCode{<html>\Hnewline}}{\HCode{\Hnewline</html>}}
   \Configure{@HEAD}{}
   \Configure{@HEAD}{\HCode{<meta charset="UTF-8" />\Hnewline}}
   \Configure{@HEAD}{\HCode{<meta name="generator" content="TeX4ht
   (http://www.cse.ohio-state.edu/\string~gurari/TeX4ht/)" />\Hnewline}}
   \Configure{@HEAD}{\HCode{<link
            rel="stylesheet" type="text/css"
            href="\expandafter\csname aa:CssFile\endcsname" />\Hnewline}}


\Configure{@HEAD}{\HCode{%
      <script type="text/x-mathjax-config">
        MathJax.Hub.Config({
          extensions: ["tex2jax.js"],
          jax: ["input/TeX", "output/HTML-CSS"],
          tex2jax: {
            \unexpanded{inlineMath: [ ['$','$'], ["\\(","\\)"] ],}
            \unexpanded{displayMath: [ ['$$','$$'], ["\\[","\\]"] ],}
            processEscapes: true
          },
          "HTML-CSS": { fonts: ["TeX"] }
        });
      </script>
}}

\Configure{@HEAD}{\HCode{<script type="text/javascript"
     src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
</script>
}}

\Configure{@HEAD}{\HCode{<style type="text/css">
     .MathJax_MathML {text-indent: 0;}
</style>}}


\begin{document}


\EndPreamble
--------------------- end nma.cfg -------------

I do not know if this is a known bug or not or if there is something
to fix it for mathml.

So I am back to square one, trying to find a faster way to
compile to HTML.

I wish someone could look at and speed up dvisvgm. htlatex is
fast, until it gets to the stage where dvisvgm kicks in, then
becomes very slow on large DVI files with lots of math.

Most of the files I have are very large, since they contain lots
of math generated by CAS and so it takes days to compile files
to HTML using htlatex where lulatex takes only minutes.

Thanks
--Nasser


More information about the tex4ht mailing list