[l2h] regexp explosion for large toc on l2h99.2beta

Tue, 28 Dec 1999 19:47:30 +0900

Thanks for your advice.

From: Ross Moore <ross@ics.mq.edu.au>
Subject: Re: [l2h] regexp explosion for large toc on l2h99.2beta
Date: Fri, 24 Dec 1999 09:57:24 +1100 (EST)

> >  Commenting out the following loop was a workaround to this problem,
> > but i think it should not be a correct solution.  (loosing section
> > numbers on output HTML despite of "-show_section_numbers" is caused by
> > this ??)
> > 
> >   12645 #    if (%renew_command) {
> >   12646 #        local($key);
> >   12647 #        foreach $key (keys %renew_command) {
> >   12648 #            $raw_arg_cmds{$key} = 1;
> >   12649 #            $raw_arg_cmd_rx =~ s/^(\(\)\\\\\()/$1$key\|/;
> >   12650 #        }
> >   12651 #    }
> >   12652     print "\n" if (/$raw_arg_cmd_rx/);
>  
> This would work if you have a large number of \renewcommand s
> for the numbering and labeling macros  \theenumi  and \labelenumi 
> e.g. one for each section heading.

A verbosity 3 listing includes messages like;

-------------------------------------------------
DBM: verbatim open...
DBM: new_command open...
DBM: renew_command open...
DBM: dependent open...
Doing ./model.tex  fork at offset 143264 

DBM: verbatim open...
DBM: new_command open...
DBM: renew_command open...
DBM: dependent open...%%%%%%%%%%%%%%%%%
Info: bracketings found: 376

Processing macros ...,,,
*** redefining \theenumi ***

 *** Warning: 
redefining command \theenumi 
,
*** redefining \labelenumi ***

 *** Warning: 
redefining command \labelenumi

....
-----------------------------------------

and i get similar lines for each \input{}.
Is this an expected behaviour or not ?

My document has a skelton tex file with many(?) \input{} lines like;

----------------------------------------------------------------------
\documentclass[twoside,fleqn]{report}
\usepackage{makeidx}
\usepackage{graphicx}
\newcommand{\vect}[1]{\mbox{\boldmath $#1$}}

\input{mvpmacro} % private macros which do not include *numi redefinition.

\makeindex
\pagestyle{headings}

\begin{document}

\pagenumbering{roman}
  \input{introduction}
\tableofcontents
\newpage
\listoffigures
\newpage
\listoftables
\newpage
\pagenumbering{arabic}

\part{Part 1}
\chapter{Introduction}
  \input{principle}
  \input{howtorun}
\part{Part 2}
\chapter{Input}
  \input{input}
  \input{option}
  \input{data}
\chapter{xxxxxxx}
  \input{source}
  \input{tally}
\chapter{xxxxxxx}
  \input{physics}
\part{part 3}
  \input{files}

\begin{thebibliography}{999}
\input{reference}
\end{thebibliography}
\appendix
  \input{install}
  \input{tools}

\printindex
\end{document}
---------------------------------------------------------------------

> But then I must ask, why is your source code arranged to do this ?
> Surely you could use the same redefinitions just once, for all the
> entries in the TOC. 

> Can you provide a cut-down example of your source for testing ?

Ok, but may need to remove japanese characters ... please wait a bit. 

> 
> I suspect you have a macro definition like:
> 
> \def\mysection#1{\begingroup
>  \renewcommand\theenumi{...}%
>  \renewcommand\labelenumi{...}%
>  ...
>  \section...{#1}.....
>  \endgroup}
> 
> so as to get a nice numbering style/layout in LaTeX.
> These style changes have no effect at all in LaTeX2HTML,
> which handles the TOC without reference to \theenumi
> and \labelenumi etc.

I've been using a my own style file to make "nice" headings, but
removing it and using standard style file "report.sty" results in the
same error.

And I found the report.sty(.cls) contains many lines like
"\renewcommand\theenumi{...}".

Is it necessary to concatenate all tex files or make a style file
using \begin{htmlonly} ?

> > PS.
> >   I'm using japanese version for LaTeX (ASC II pLaTeX 2e) and dvips,
> > but this should not make any difference (i hope?).
>  
> No, it should not.
> But does LaTeX2HTML give you adequate support for Japanese characters
> in your LaTeX documents ?
> I've done no work at all on this aspect.

I'm using Latex2html for japanese tex files on Linux (may work on
other UNIX-like systems) taking care for the following things;

(1) use "Japanese EUC" character code(a 2-byte code having 8'th bit
set to "1" on each byte).  Perl5.x can handle 8-bit code and no need
for any "japanization" patches necessary for perl4.x.

(2) set $charset variable to "euc-jp" using .latex2html-init;

   $charset= "euc-jp" ;

Using "iso-2022-jp" (7 bit JIS code) or unicode as character set may
be more standard compliant, but currently i've not tested these codes
for l2h.

 ==========================================
  Makoto Sasaki
   The Japan Research Institute Ltd. 
   sasaki@tyo.sci.jri.co.jp