[tex4ht] what is the fastest way to convert large document to HTML?

Martin Gieseking martin.gieseking at uos.de
Mon Aug 20 11:18:40 CEST 2018


Am 19.08.2018 um 01:10 schrieb Michal Hoftich:
>> However, technically it shouldn't be necessary to convert all math fragments
>> every time the document is processed. In the (proprietary) infrastructure at
>> our university we only convert the portions that have actually changed. This
>> is simply done by using md5-based file names computed from the corresponding
>> LaTeX code. Before starting the actual conversion, the system checks if
>> there's already an SVG file present which matches the hash value of the
>> LaTeX code. If so, running LaTeX and dvisvgm can be skipped. This also has
>> the advantage that every fragment is created only once even if it's
>> referenced multiple times in the document.
>>
> 
> This is actually great idea. I've created simple Lua package which can
> process DVI pages and calculate MD5 hashes for their contents. make4ht
> can then rename files generated by Dvisvgm according to the hashes and
> replace the image names in HTML files. Zip file with all necessary
> files is attached. It can be executed with


Hi Michal,

that looks awesome. I'll have a closer look at your package later today.

Just a first observation: If I understand the dvireader script 
correctly, it reads all bytes following a "bop" command until the "eop" 
value 140 is reached. Since many DVI commands require additional 
parameters, it's likely that one of these bytes is 140 as well so that 
the MD5 sum will be computed only for a part of the page, i.e. changes 
in the remaining section wouldn't be recognized.

Perhaps it's also possible to add the computation and comparison of the 
hashes to dvisvgm because it processes the DVI file anyway. I have to 
think about this a bit more.

Best,
Martin



More information about the tex4ht mailing list