[l2h] avoiding conversion of "--" to "-"

Ross Moore ross at ics.mq.edu.au
Tue Dec 30 00:21:18 CET 2003


Hi Fred,

On Wed, 17 Dec 2003, Fred L. Drake, Jr. wrote:

>
> I'm using LaTeX2HTML to convert programmer's API documentation using a
> fair bit of custom Perl code.
>
> Over the years, I've dealt with many places in our documentation where
> the text "--" is contained as content rather than an markup for an
> en-dash.  In each case, I've avoided the en-dash conversion by adding
> still more markup in the document text.  While less than ideal, it has
> worked.
>
> I recently decided it was time to tackle this problem in a more
> general way.
>
> In each case where I've needed to deal with this conversion, the
> affected "--" has occurred in content which is known to never need the
> conversion based on the surrounding markup.  What I've tried to do in
> these cases is to convert the "--" to some other HTML spelling of
> those two characters; I've tried both "--" and the XHTML-ish
> "--".  In both cases, the conversion still takes place.
>
> Where in LaTeX2HTML is this conversion being done?  Is there some way
> to suppress this without an uglier transformation of the "--"?


It happens in the &text_cleanup routine:

# This routine must be called once on the text only,
# else it will "eat up" sensitive constructs.
sub text_cleanup {
    # MRO: replaced $* with /m
    s/(\s*\n){3,}/\n\n/gom;     # Replace consecutive blank lines with one
    s/<(\/?)P>\s*(\w)/<$1P>\n$2/gom;  # clean up paragraph starts and ends
    s/$O\d+$C//go;              # Get rid of bracket id's
    s/$OP\d+$CP//go;            # Get rid of processed bracket id's
    s/(<!)?--?(>)?/(length($1) || length($2)) ? "$1--$2" : "-"/ge;
      ^^^^^^^^^^^^_________________________________^^_______^
    here's the pattern!
    HTML comment delimiters pass unchanged
    other occurrences of -- are contracted

>
> I will note that converting "--" to "-<span>-</span>" or
> "-<!--junk-->-" works, but both are incredibly ugly ways of doing
> this.

Sure.
I'd suggest that you replace the above line by a subroutine call,
then define the subroutine to do whatever replacements you think
are best for you -- perhaps none at all.


Theoretically, this replacement line is wrong, since it occurs
on 'output' rather than on 'input', as it would do with a TeX engine.
But I cannot find a better place for it, since it needs to act on the
result of macro expansions, as well as the normal text of the document.

Currently the replacement acts *after* all macro expansions have been
done, and all environments have been processed, but *before* verbatim
strings (and other 'sensitive' marked constructs) are re-inserted into
the document.

You say...

> In each case where I've needed to deal with this conversion, the
> affected "--" has occurred in content which is known to never need the
> conversion based on the surrounding markup.  What I've tried to do in

So perhaps you should be using a construct that creates a 'sensitive'
marker for the whole block of content, in a similar way to how
verbatim-like environments are handled. (These have their content
stored in a database, and a 'marker' inserted into the document;
to be replaced much later by the &replace_sensitive_markers routine.)
However, environments like {alltt} cannot be done this way, as for
those macros still need to be expanded.


Any further ideas would be welcome.


>
> Thanks!
>
Cheers,

	Happy New Year

		Ross


>
>   -Fred
>
> --
> Fred L. Drake, Jr.  <fdrake at acm.org>
> PythonLabs at Zope Corporation
> _______________________________________________
> latex2html mailing list
> latex2html at tug.org
> http://tug.org/mailman/listinfo/latex2html
>


More information about the latex2html mailing list