[l2h] Bug (+ proposed fix): alltt and multiline matching

Igor Pechtchanski pechtcha at cs.nyu.edu
Tue Jan 11 04:22:35 CET 2005


I'm reposting the below, both because I think it's important to have this
fixed, and because I've just discovered another (more annoying) instance
of this bug:

----------------- BEGIN bug4.tex -----------------
\documentclass{article}
\usepackage{alltt}
\newcommand{\nothing}[1]{hard[#1]}
\begin{document}
\par
Try~\nothing{er}.
\end{document}
------------------ END bug4.tex ------------------

There's an extra space inserted between the 'Try ' and 'hard[er]'
(HTML snippet below).  This is due to the rules at the end of
substitute_newcmd in the latex2html script, namely

# Handle the cases as depicted in the description of new command
# substitution.
    local($befdel,$aftdel);
    $befdel = ' '
	if ($before=~/(^|[^\\])\\[a-zA-Z]+$/ && /^$/ && $after=~/^[a-zA-Z]/) ||
	    ($before=~/(^|[^\\])\\[a-zA-Z]+$/ && /^[a-zA-Z]/);

Here, $before=~/(^|[^\\])\\[a-zA-Z]+$/ matches the "\par" (incorrectly!)
because it's doing the match in multiline mode instead of the single-line
one.  Since /^[a-zA-Z]/ also matches (correctly), the space is inserted.
Unlike the earlier case, there is no nice workaround, because the regular
expression is hard-coded into the script.

For completeness' sake, here's the HTML produced by the above code
(snipped):

----------------- BEGIN bug3-bad.html -----------------
<META NAME="Generator" CONTENT="jLaTeX2HTML v2002 JA patch-1.4">
...
<BODY >
...
<P>
Try&nbsp; hard[er].
<BR><HR>
------------------ END bug3-bad.html ------------------

I believe I finally found the reason for this.  The file alltt.perl sets
$* to 1 in preprocess_alltt() and doesn't reset it.  This causes all
regular expression matches from that point on to be in multi-line mode.
Something like

local ($saveMLM) = $*;
...
$* = $saveMLM;

in the same places that $/ is saved and restored within preprocess_alltt()
should do it.  There probably are other consequences of this, if alltt
needs it to work properly.  Please let me know if you need more details,
or an actual patch.
	Igor

On Wed, 29 Dec 2004, Igor Pechtchanski wrote:

> Hi,
>
> There is some strange interaction between the alltt style file and
> unquoted (or, rather, ungrouped) command arguments.  The following
> document illustrates the bug:
>
> ----------------- BEGIN bug3.tex -----------------
> \documentclass{article}
> \usepackage{alltt}
> \newcommand{\eatone}[1]{Eating '{#1}'...}
> \begin{document}
> \eatone12345
> {test}
> \end{document}
> ------------------ END bug3.tex ------------------
>
> This produces the following HTML (snipped):
>
> ----------------- BEGIN bug3-bad.html -----------------
> <META NAME="Generator" CONTENT="jLaTeX2HTML v2002 JA patch-1.4">
> ...
> <DIV ALIGN="LEFT">
> <TT>
> Eating 'test'...12345
>
> </TT>
> </DIV>
> ------------------ END bug3-bad.html ------------------
>
> This only occurs if the "alltt" package is loaded and there is a line in
> the input file that follows the command and begins with '{'.
>
> The problem seems to be that $next_pair_rx (and $next_pair_pr_rx), used in
> the argument processing code, match in multiline mode, and, since they
> begin with "^", they will match the start of *any* line, not just the
> start of the string representing the rest of the document ($after).
>
> One possible fix for this is to prepend "(?s-m)" to $next_pair_rx and
> $next_pair_pr_rx.  This does indeed fix the problem (for me), and produces
> the following (correct) HTML (again, snipped):
>
> ----------------- BEGIN bug3-good.html -----------------
> <META NAME="Generator" CONTENT="jLaTeX2HTML v2002 JA patch-1.4">
> ...
> <DIV ALIGN="LEFT">
> <TT>
> Eating '1'...2345
> test
> </TT>
> </DIV>
> ------------------ END bug3-good.html ------------------
>
> Please let me know if you need more details, or an actual patch.
> 	Igor

-- 
				http://cs.nyu.edu/~pechtcha/
      |\      _,,,---,,_		pechtcha at cs.nyu.edu
ZZZzz /,`.-'`'    -.  ;-;;,_		igor at watson.ibm.com
     |,4-  ) )-,_. ,\ (  `'-'		Igor Pechtchanski, Ph.D.
    '---''(_/--'  `-'\_) fL	a.k.a JaguaR-R-R-r-r-r-.-.-.  Meow!

"The Sun will pass between the Earth and the Moon tonight for a total
Lunar eclipse..." -- WCBS Radio Newsbrief, Oct 27 2004, 12:01 pm EDT



More information about the latex2html mailing list