[l2h] Bug in LaTeX2HTML: ligature-breaking chars don't work)

Anthony Fok Anthony Fok <foka@ualberta.ca>
Sat, 1 Jul 2000 08:48:19 -0600


Hello Ross,

On Thu, Jun 29, 2000 at 09:44:32PM +1000, Ross Moore wrote:
> No, that is not correct.
> If the environment is defined correctly then a \verb certainly can
> be used within it.
> 
> The fact that you are getting mixed-up with is that \verb cannot
> be used "within the argument to a macro".

Oh, yes, indeed, silly me.  :-)  However, I couldn't quite use \verb
in many places in the debian-guide*.tex file either because they were like
this:  \texttt{some text \textbf{dfdfd---}}.  Although I probably should
double check again to see exactly how the code was.

> For your {lyxcode} environment, you really want it to be treated
> exactly as a {verbatim} environment. You want the contents to be
> presented verbatim.
> 
> In HTML, you want the pre-formatted tags:
> 
> <PRE>
>   $ ls -l
>
>   total 48488
> 
>   -rw-rw-r-\/-    1 foka     foka         3939 May 23 23:19 02.txt.1
> </PRE>
> 
> You can achieve this, with minimal changes, by including the
> following lines in the preamble of your document:
> 
> \usepackage{html}
> \includecomment{lyxcode}
>
> this should make {lyxcode} into a verbatim-like environment
> for both LaTeX and LaTeX2HTML.
> If you don't want to alter anything for LaTeX,
> then just use the lines:
> 
> \usepackage{html}
> \begin{htmlonly}
> \includecomment{lyxcode}
> \end{htmlonly}

Hehe, actually, lyxcode is not entirely a verbatim environment, as it
allows bolded text, etc.  However, I have to admit some of the
debian-guide*.tex source is kind of ugly when I examine it by hand. 
the lyxcode environment is defined to be a list environment, etc.

I was not the original author of the debian-guide*.tex, so I wasn't
prepared to do an overhaul manually only to find I've messed up its
appearance somewhere.  Maybe I'll try that again.  I just hope whatever
I changed will still work okay with LyX.  :-)

> > Anyhow, the above would be translated to "-rw-rw-r-", i.e. LaTeX2HTML
> > neglects the \/, changing -\/- to --, which got interpreted as an en-dash,
> > and becomes "-" in the final HTML output.
> 
> That's only because it thinks that it is processing ordinary text.
> If you have made your environment into a verbatim-like one,
> then nothing is touched at all.

Yes, indeed.  But again, things like \textbf{\verb|-rw-rw-r--|} is a big
no no.  I'll have to double check to see if they used something like
that in the source.

> > > Yeah,  \textcompwordmark  isn't recognised by LaTeX2HTML.
> > > I'll put that on the TODO list.
> > 
> > Thanks a lot.  If you could fix the "-\/-\/-" bug too, that would be
> > wonderful.  Thanks!  :-)
> 
> Sorry; that isn't a bug.
> LaTeX2HTML isn't a state-machine the way TeX is, so you cannot locally
> change the interpretation of short character sequences. 
> (If you really want to rely on TeX's processing model,
> then try grappling with the TeX4ht program;
> though I suspect that you will have other difficulties with that.)

Hmmm... well then, could it be documented explicitly in the manual? 
(Or was it mentioned already?)  It is an undocumented behaviour that is
contrary to most users' expection, and that makes it a "bug"...  unless it
is documented, say, in the troubleshooting section.

> You need to recognise that you want {lyxcode} to map to HTML
> <PRE>....</PRE> tags. 
> Something needs to tell this to the LaTeX2HTML processor.
> That is the purpose of the  \includecomment  command.
> (See the package  comment.sty  to find out how this works for LaTeX.)

Thanks a lot, I'll keep that in mind.  This lyxcode environment is
quite ugly though:

\newenvironment{lyxcode}
  {\begin{list}{}{
    \setlength{\rightmargin}{\leftmargin}
    \raggedright
    \setlength{\itemsep}{0pt}
    \setlength{\parsep}{0pt}
    \ttfamily}%
   \item[]}
  {\end{list}}

\begin{lyxcode}
D:\textbackslash{}DISTS\textbackslash{}SLINK\textbackslash{}MAIN\textbackslash{}
DISKS-I386\textbackslash{}

2.1.8-1999-02-22>\textbf{rawrite2}

RaWrite~2.0~-~Write~disk~file~to

raw~floppy~diskette

Enter~disk~image~source~file~name:~\textbf{drv1440.bin}

Enter~target~diskette~drive:~\textbf{a:}

Please~insert~a~formatted~diskette~into

drive~A:~and~press~-ENTER-~:
\end{lyxcode}

The above is supposed to produce:

	D:\DISTS\SLINK\MAIN\DISKS-I386\
	2.1.8-1999-02-22>rawrite2
	RaWrite 2.0 - Write disk file to
	raw floppy diskette
	Enter disk image source file name: drv1440.bin
	Enter target diskette drive: a:
	Please insert a formatted diskette into
	drive A: and press -ENTER- :

where rawrite2, drv1440.bin and a: are boldfaced (to show the users'
entry.) So, a pure verbatim environment won't work.  But then again, I
find the current code (all the \textbackslash{} etc.) very ugly for
hand-maintenance indeed.  :-p

> > Incidentally, there's a similar example where \/ didn't work, but
> > \textcompwordmark worked.  e.g.
> > 
> > 	data >> ~/logout-time
> > 
> > is coded in the original TeX source as:
> > 
> > 	date~>\textcompwordmark{}>~\~{}/logout-time
> 
> That is an ugly kludge, which makes your source code unnecessarily
> unreadable.

I know.  I didn't do it; it was there in the first place.  I tried
changing it ">\/>", but that doesn't work.

You mentioned that LaTeX2HTML doesn't support things like "-\/-\/-",
and yet ">\textcompwordmark{}>" works, as ugly as it may seem.

> All you have to do is define the {lyxcode} environment properly
> and all your problems will be solved...
> 
>   .. and your source documents will be readable/editable.

I know.  It is not my own document though, and it will take quite a bit
of time to fix all the lyxcode environments (hundreds of them?)
throughout the document.

> > And it works great!  I tried to substitute \textcompwordmark with \/,
> > and the ">>" got lumped into an ligature in the LaTeX2HTML output.
> > Hmm...
> 
> Yep. \textcompwordmark got ignored, since it is undefined.
> Then >> probably became the french guillemet, or the math \gg symbol,
> depending upon what language you've specified.

Yes, indeed.  Hmm... if \textcompwordmark is ignored and that makes
'>\textcompwordmark{}>', can't you just make "\/" undefined, hence
ignored, and that it would do what it is supposed to do?  That way,
">\/>" would work as intended.

> The *real* solution here is to write a small module  lyx.perl
> which achieves in Perl the same effect as  \includecomment{lyxcode} .
> To use this, you will still have to remove all the kludgy markup
> from these environments.

Yes, indeed.  Too bad, I probably won't have time to do that at the
moment, but I'll bring it up with the original authors and see what
they say.  (e.g. it is probably pointless to revise the code if they
are going to use LyX to do the second edition.  LyX might just ruin
everything...)  Or maybe LyX isn't that bad?  I don't usually use LyX,
so I wouldn't know.

Sorry about my rant.  It was just that I pulled too many hairs out
trying to figure out what went wrong and how to fix it.  I failed, so I
wrote a kludge instead:  I used -\/-\/- in the source, and used this to
change the *.tex file for the HTML version:

   # Work around a LaTeX2HTML bug that ignores \/ or \textcompwordmark{}
   # for breaking up "---" and "--" ligatures.
   s%(-\\/-\\/-)%\\latexhtml\{$1\}\{\\begin\{rawhtml\}---\\end\{rawhtml\}\}%g;

Okay, maybe I should have "s/bug/non-bug/;"  :-p  :-)

Cheers, and thanks for all your help and tips, which will definitely
come in handy for future projects.  :-)

Anthony

-- 
Anthony Fok Tung-Ling                Civil and Environmental Engineering
foka@ualberta.ca, foka@debian.org    University of Alberta, Canada
Debian Chinese Project -- http://www.debian.org/international/chinese/
Come visit Our Lady of Victory Camp -- http://www.olvc.ab.ca/