[l2h] regexp explosion for large toc on l2h99.2beta
Mako Sasaki
Mako Sasaki <sasaki@tyo.sci.jri.co.jp>
Tue, 28 Dec 1999 19:47:30 +0900
Thanks for your advice.
From: Ross Moore <ross@ics.mq.edu.au>
Subject: Re: [l2h] regexp explosion for large toc on l2h99.2beta
Date: Fri, 24 Dec 1999 09:57:24 +1100 (EST)
> > Commenting out the following loop was a workaround to this problem,
> > but i think it should not be a correct solution. (loosing section
> > numbers on output HTML despite of "-show_section_numbers" is caused by
> > this ??)
> >
> > 12645 # if (%renew_command) {
> > 12646 # local($key);
> > 12647 # foreach $key (keys %renew_command) {
> > 12648 # $raw_arg_cmds{$key} = 1;
> > 12649 # $raw_arg_cmd_rx =~ s/^(\(\)\\\\\()/$1$key\|/;
> > 12650 # }
> > 12651 # }
> > 12652 print "\n" if (/$raw_arg_cmd_rx/);
>
> This would work if you have a large number of \renewcommand s
> for the numbering and labeling macros \theenumi and \labelenumi
> e.g. one for each section heading.
A verbosity 3 listing includes messages like;
-------------------------------------------------
DBM: verbatim open...
DBM: new_command open...
DBM: renew_command open...
DBM: dependent open...
Doing ./model.tex fork at offset 143264
DBM: verbatim open...
DBM: new_command open...
DBM: renew_command open...
DBM: dependent open...%%%%%%%%%%%%%%%%%
Info: bracketings found: 376
Processing macros ...,,,
*** redefining \theenumi ***
*** Warning:
redefining command \theenumi
,
*** redefining \labelenumi ***
*** Warning:
redefining command \labelenumi
....
-----------------------------------------
and i get similar lines for each \input{}.
Is this an expected behaviour or not ?
My document has a skelton tex file with many(?) \input{} lines like;
----------------------------------------------------------------------
\documentclass[twoside,fleqn]{report}
\usepackage{makeidx}
\usepackage{graphicx}
\newcommand{\vect}[1]{\mbox{\boldmath $#1$}}
\input{mvpmacro} % private macros which do not include *numi redefinition.
\makeindex
\pagestyle{headings}
\begin{document}
\pagenumbering{roman}
\input{introduction}
\tableofcontents
\newpage
\listoffigures
\newpage
\listoftables
\newpage
\pagenumbering{arabic}
\part{Part 1}
\chapter{Introduction}
\input{principle}
\input{howtorun}
\part{Part 2}
\chapter{Input}
\input{input}
\input{option}
\input{data}
\chapter{xxxxxxx}
\input{source}
\input{tally}
\chapter{xxxxxxx}
\input{physics}
\part{part 3}
\input{files}
\begin{thebibliography}{999}
\input{reference}
\end{thebibliography}
\appendix
\input{install}
\input{tools}
\printindex
\end{document}
---------------------------------------------------------------------
> But then I must ask, why is your source code arranged to do this ?
> Surely you could use the same redefinitions just once, for all the
> entries in the TOC.
> Can you provide a cut-down example of your source for testing ?
Ok, but may need to remove japanese characters ... please wait a bit.
>
> I suspect you have a macro definition like:
>
> \def\mysection#1{\begingroup
> \renewcommand\theenumi{...}%
> \renewcommand\labelenumi{...}%
> ...
> \section...{#1}.....
> \endgroup}
>
> so as to get a nice numbering style/layout in LaTeX.
> These style changes have no effect at all in LaTeX2HTML,
> which handles the TOC without reference to \theenumi
> and \labelenumi etc.
I've been using a my own style file to make "nice" headings, but
removing it and using standard style file "report.sty" results in the
same error.
And I found the report.sty(.cls) contains many lines like
"\renewcommand\theenumi{...}".
Is it necessary to concatenate all tex files or make a style file
using \begin{htmlonly} ?
> > PS.
> > I'm using japanese version for LaTeX (ASC II pLaTeX 2e) and dvips,
> > but this should not make any difference (i hope?).
>
> No, it should not.
> But does LaTeX2HTML give you adequate support for Japanese characters
> in your LaTeX documents ?
> I've done no work at all on this aspect.
I'm using Latex2html for japanese tex files on Linux (may work on
other UNIX-like systems) taking care for the following things;
(1) use "Japanese EUC" character code(a 2-byte code having 8'th bit
set to "1" on each byte). Perl5.x can handle 8-bit code and no need
for any "japanization" patches necessary for perl4.x.
(2) set $charset variable to "euc-jp" using .latex2html-init;
$charset= "euc-jp" ;
Using "iso-2022-jp" (7 bit JIS code) or unicode as character set may
be more standard compliant, but currently i've not tested these codes
for l2h.
==========================================
Makoto Sasaki
The Japan Research Institute Ltd.
sasaki@tyo.sci.jri.co.jp