[l2h] An Apparent Byte Size Limit for a Portable Network Graphics (.png) Image File Containing Simplified Chinese Characters Produced by LaTeX2HTML From a .tex File Containing LaTeX and Chinese/Japanese/Korean (CJK) for LaTeX Commands

Ross Moore ross.moore at mq.edu.au
Mon Aug 2 22:29:08 CEST 2010

Hello Pat,

On 03/08/2010, at 3:55 AM, "Pat Somerville" <l_pat_s at hotmail.com> wrote:

> Thank you, Professor Moore, for kindly taking the time to respond to me.  In my .tex file I now have CJK segments that each begin with \begin{CJK}{UTF8}{gbsn} and end with \end{CJK} that are small enough to avoid the problem of a too-tall or too-large a .png file.  Assuming you are correct about some .png images being too tall for a page, the corresponding, problematic, .png size appears to have been between 65.7 KiloBytes (KB) and 75.1 KB. 

LaTeX2HTML, when run in its default mode, creates images of those portions that are not easily expressed in HTML, using Latin characters. It was written before Unicode was well supported in web browsers, so many characters could not be directly supported.

To make these images, a LaTeX job is run, on a file called images.tex, which is constructed especially for this purpose. This file is usually left behind, along with it's .log file, after the LaTeX2HTML job has completed. If that is not the case for you, then use the -debug switch. One effect of this switch is to inhibit all the cleanup actions, allowing you to see all of the intermediate files that are needed in a run.

The papersize for this run of images.tex is usually quite small, say A6 rather than A4, since images are scaled by approximately 1.6 or so. Also, the orientation is 'landscape'. Larger paper sizes use more memory, to no good effect. You can adjust both the paper size and scaling factor, and the orientation. Read the manual to get the names of the corresponding variables that need to be changed.
Alternatively, keep the pieces small which need to be rendered as images. Here I mean small in physical dimension, not file size.

> By now a number of my output files from failed LaTeX and LaTeX2HTML runs may have been deleted.  But from a failed execution of a command of the form "latex2html...... MyFile.tex" in a folder with a corresponding name of the form MyFile it seems like I had a zero-byte, .png file numbered before a .png file with a size of 107 KiloBytes (KB) or 94 KB.  That is consistent with what you expected.  For the benefit of other readers of this e-mail letter you thought the 0-byte-sized .png  file could occur when the .png file after it would be too tall to fit on a page of output.  I doubt if I have ever encountered a case in my use of LaTeX2HTML in which a mathematical expression in a .png file would have been too tall to fit on a page of, say .dvi output from LaTeX.  It seems to me that a .html output file from an execution of latex2html command ought to be just one Web page long.  If so, this is a curious thing for me.--That is I would expect a .png file to always be shorter in height than the entire, .html, output file from LaTeX2html that uses the .png file; and I think the "Bad file descriptor" errors for generating problematic .png files were generated by LaTeX2HTML 1.70 instead of LaTeX 2e.
> I realize that my understandings of the operations of LaTeX and LaTeX2HTML are limited.  From running the two programs on a file of the form MyFile.tex in a terminal program as a root user I recall that LaTeX can produce a file of the form MyFile.dvi with multiple pages of output. 

Indeed images.tex produces 1 page for teach required image. These pages are output as separate .ps files by dvips. Then each of these .ps files is rendered to a bitmap by dvips, after which various utilities run to get the image to the proper minimal size and shape. It is a very complicated process, which can fail for various reasons, usually related to peculiarities of the actual input source. In your case, I'd expect that you are using a lot of CJK source, perhaps set vertically. Thus your problems are most probably related to the paper size used with the images.tex run. To verify this you need to look at its .log file, as I said in my first email.
Without seeing that, and the console messages produced by the full LaTEX2HTML run, preferably with the -debug switch, I cannot help you further.  

> That design of separating the output into multiple pages is understandable when one wants the MyFile.dvi output file to be printed onto sheets of paper.

The pagination produced by LaTeX on your whole job is quite irrelevant for HTML.

> Basic questions:
> 1.  I often execute a command of the form "latex MyFile.tex" once or twice before executing a command of the form "latex2html.....MyFile.tex".  In this way sometimes I could be made aware of some LaTeX command errors in my file of the form MyFile.tex.  But is first running LaTeX like that absolutely necessary before running LaTeX2HTML?
> 2a.  How is it that LaTeX2HTML could "think" in terms of multiple pages when the .html output file appears to me to be just one, long Web page?
> 2b.  Does LaTeX2HTML rely on the page separations generated by LaTeX in producing a file of the form MyFile.dvi?
> Okay, now I return to the problem of some .png image files containing simplified Chinese characters, .png images which are too tall for a page, assuming you are correct.  I am not sure I have ever encountered this problem for a .png file for a single mathematical expression containing only numbers, mathematical symbols, and/or just a few Greek letters and/or English words.  So the design of putting each mathematical expression or sometimes one Greek letter in one .png file is a good one because it usually avoids this problem.  Apparently the simplified Chinese characters are packaged in groups in .png files with one CJK segment per .png file.  I now see two possible ways in which the problem of too-tall, .png images containing Chinese characters could be avoided by a change in the design of some software:

The images are being made of the whole environment, not separate characters, unless you have added extra command-line options to get the characters separately. Most likely you haven't done anything special. Read the chapter on LaTeX2HTML in the book The LaTeX Web Companion, published by Addison-Wesley, for a description of how to adjust the output HTML that you can produce. In particular, you may be able to do away with the need for images, and use Unicode characters instead. I've never done that for CJK, but others may have done so.

> I.  Make LaTeX2HTML always "think" of MyFile.html as one, long page; and make it "think" of the length of that page as including all of the .png files the .html file uses.  Managed in this way the problem of a .png file being too tall or too large in byte size should never occur because the .html file should always be as "tall" or "taller" (really long or longer) or contain as many or more bytes as a .png file used by .html file.
> 2.  Have LaTeX2HTML assign each, different, simplified Chinese character used in the file MyFile.tex file to its own .png file.  This would be similar to the strategy used by LaTeX2HTML for each mathematical expression or isolated Greek letter.  I guess that using a font size for a Chinese character taller than a page of corresponding MyFile.dvi output would either seldom occur or may not even be possible if such a large font size does not exist.
> Meanwhile, if needed, I could in principle continue to break long {CJK} segments in a .tex file into shorter ones to avoid the problem of a .png file that is too tall.  Again thanks for writing to me, Professor Moore.  Oh yes, some more good news is that although the messages I sent to two, different, e-mail addresses attempting to subscribe to a CJK users group failed to be delivered, from an e-mail letter I sent to a different, CJK,  e-mail address for the purpose of discussing this problem I received what appears to have been an automatically generated response informing me that I could receive a future response.
> Pat         

Hope this helps,

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://tug.org/pipermail/latex2html/attachments/20100803/f05fe78f/attachment-0001.html>

More information about the latex2html mailing list