Thu, 9 Aug 2001 17:32:13 -0700 (PDT)
Sorry about the belated response
On Thu, 9 Aug 2001, Franco Bagnoli wrote:
> I have planned since a long time to try to improve the speed of
> latex2html. Up to now I was unable to find the time, but who knows...
> The bottleneck of l2h is latexing, converting to ps, cropping, etc. so I
> think that one could increase performances a lot by caching the
> images. And caching increases the more document are processed (at present,
> caching only occurs for the subsequent versions of the same document).
One option might be to create one psfile and an array of frame
coordinates to capture each formula displayed in it, but this
goes much farther back into the codepath than I am or will be
for awhile, I think. :)
If by caching you mean removing tempfiles and storing more data
in variables... my concentration is going to be learning the code
for awhile, yet, getting the strategy from the implementation.
> My idea is to use l2h as a "server", or with a server part, so that one
> can profit of site sharing of caches. In this way it would be possible
> also to develop a (quick) mod_latex module for apache, so that one needs
> not to convert latex documents to html, only put the source on some
> public accessible directory (since it is processed by a modulus, the
> source is unavailable to direct download).
I like that idea very much. That sort of flexibility is nice...
> The idea of cache is very similar to what happens at present with
> images.pl: use the latex fragment as key to access the image part.
> Since in the server case one has to face with concurrent access, it may be
> necessary to use a database to store the data, the choice of actual
> database can be made transparent using the tie mechanism.
Thanks for the tip, I'll peek into images.pl. I'm still working
out the path a .tex file takes through the codefiles.
> One could also think to develop a dvi-to-html driver to bypass ghostcript
> and cropping, much in the way tex4ht works (is it true?): the dvi says
> where to put a given entity, and since the image of the entity is present
> in the cache (or is generated) it should be possible to process it using
> the ImageMagick or GD module.
Bypassing gs alone would give a big speed-boost. Using GD sounds
like a great idea. Hey, I'm already familiar with it. It would need
some major reworking of the code, though, I see after reading Ross Moore's
comments on how an image is constructed.
> Moreover, one could profit of the mod_perl feature of having
> a perl interpreter (and l2h functions) always in memory. This would imply
> rewriting l2h so to avoid global writable variables (i.e., an object
> oriented approach).
> I'm writing these ideas in order to have your opinion, and also to suggest
> people thinking to a large rewrite of latex2html to consider leaving the
> appropriate "slots" in the code:
> 1) persistence of code
Not sure what you mean here, like headers/footers?
> 2) accessing the database of cache
Tie:: could do that (or DBI.)
> 3) processing of latex pieces
Regarding this, Randal Schwartz published an article in
WebTechniques in which he describes a page decoration image module
for apache triggered in response to 404's that created the image on
the fly. The image properties were determined using the basename
of the image. Future requests wouldn't trigger the script because
the image was stored ondisk after serving the request. I think
an image-server using something like images.pl's tag conventions
as the image naming scheme could be done (probably already has???)
This is all really high level though, I'm going to concentrate on
janitorial stuff for now, and hopefully remove some of the Perl4-isms.
Uh, I have different stylistic conventions for my code, though. I
don't know how to deal with that.
There was a time
A wind that blew so young
For this could be the biggest sky
And I could have the faintest idea