[l2h] hyperref.sty vs. html.sty: "\textunderscore undefined"

Ross Moore ross@ics.mq.edu.au
Sat, 14 Sep 2002 14:34:38 +1000 (EST)


Hello again Julius, and Heiko,

> At 08:54 PM 9/13/2002 +1000, Ross Moore wrote:
> >In which order did you load the packages ?
> >
> >If you do
> >  \usepackage{hyperref}
> >  \usepackage{html}

The problem with URLs that are to work correctly with both
 hyperref  and  html  syntax, is related to category-codes
and the order in which the arguments need to be presented.

With hyperref's  \href  command, the arguments are:
  \href{<URL>}{<anchor text>} 
whereas for \htmladdnormallink the arguments are reversed:
  \htmladdnormallink{<anchor text>}{<URL>}

This looks like it is sufficient for   hyperref.sty  to
implement  \htmladdnormallink  as 
 \htmladdnormallink #1#2 <-- \href{#2}{#1} 
but this is *not* correct, as the category-codes of certain
characters need to be changed *before* the <URL> argument is read.

Similarly  html.sty  cannot implement \href as
 \href #1#2 <-- \htmladdnormallink{#2}{#1}
for LaTeX processing (though the analogue of this will work
for Perl processing within LaTeX2HTML).

Of course if no special characters occur within the <URL> then
these simple definitions do indeed work adequately.


When I make documents that are to be processed by both LaTeX2HTML
and pdfLaTeX, I make a local definition for each <URL> that occurs.
This is done using either the conditional environments defined in  html.sty
or by conditionally loading different files with alternate definitions;
e.g.

%begin{latexonly}
 \urldef\rossURL\url{http://www.maths.mq.edu.au/~ross/}
 \urldef\chrisURL\url{http://..... }
 ....
 ....
%end{latexonly}
\begin{htmlonly}
 \def\rossURL{http://www.maths.mq.edu.au/\~{}ross/}
 \def\chrisURL{http://..... }
 ....
 ....
\end{htmlonly}

\newcommand{\rossHome}{\htmladdnormallink{Ross' Home Page}{\rossURL}}


Then the body of the document contains just references like \rossHome,
or other anchors of the form:
  \htmladdnormallink{<anchor text>}{\rossURL}

This helps keep the body of the LaTeX document more easily readable,
and allows updates to URLs to be made in the preamble only,
rather than looking for all instances within the body.


Using \input files, this can be simplified to:

%begin{latexonly}
 \input PDFurls
%end{latexonly}
\begin{htmlonly}
 \input HTMLurls.tex
\end{htmlonly}
\newcommand{\rossHome}{\htmladdnormallink{Ross' Home Page}{\rossURL}}

to get the correct definitions for the different processors.
Of course now you need to keep 2 files up-to-date with the
correct URL addresses and macro definitions.



> >Then it should work OK, since there is coding in  html.sty  to
> >check whether hyperref has been loaded already, and act accordingly.
> 
> Aha! My ordering was in fact the opposite, as recommended in the hyperref 
> documentation.  I have now interchanged the order, and the behavior is 
> indeed different:

Sure. Early documentation suggested that hyperref needs to be loaded last.
This was because hyperref redefines many things, so it was easy to detect
whether a package had been loaded and then make the redefinition.

But that isn't the best approach.
Better is to define the new expansions, regardless of whether the package
has been loaded or not, then use  \AtBeginDocument{....}  to do the test
at a later, fixed, place within the LaTeX processing.
If the package has been loaded, then the relevant macro-names get \let to
the hyperref versions, perhaps inheriting from the package's own definitions.

Most of hyperref's special package support is now done this way, I think,
so it's loading order is not so crucial any more.
Heiko should be able to confirm whether this is indeed true.
(Sorry to dob you in, Heiko :-)

 
> * The anchor text in \htmladdnormallinkfoot is no longer linked in the PDF 
> (which I miss).  The cursor changes to a hand-index-finger-pointer when the 
> mouse rolls over the footnote number itself, as if it were a link, but it 
> doesn't seem to go anywhere.  Perhaps it just links to the page containing 
> the footnote, which is usually the current page.

You can test this easily.
Use a page-view where the footnote is not visible but the anchor text is
 --- but not the top of the page. What happens when you click the link?
It should either jump to include the footnote,
or jump to the top of the page, perhaps resizing.


> 
> * The footnote from \htmladdnormallinkfoot is now linked in the PDF (an 
> improvement, in my opinion).
> 
> * (The anchor text within \htmladdnormallink is PDF-linked as before.)
> 
> * The \url{} quoting solution I posted earlier no longer works.  Now, all 
> footnote URLs (on one particular page in the PDF file, anyway) point to the 
> same bogus URL:
> 
> http://www-ccrma.stanford.edu/~jos/intro320/protect%20%08egingroup%20catcode%20`%20active%20def%20%20{%20}catcode%20`%active%20let%20%%let%20%%catcode%20`.pdf

That's because \url has been expanded, but not executed, making a real mess.

> 
> which was apparently derived from the first footnote URL 
> (http://www-ccrma.stanford.edu/~jos/intro320/Online_Reference.html).  All 
> URLs look fine in the footnotes, and they look right in the PostScript file 
> as well.  All links work correctly in the HTML.  Thus, only the PDF seems 

In Perl, multiple passes over the same string can be made.
Thus different patterns, such as:  ~ \~{} &nbsp; %2E etc. 
can all be recognised and replaced appropriately.
Macro-programming in TeX is inherently more difficult.

> to be thrown off by something.
 
> * URLs in the second argument of htmladdnormallinkfoot which are NOT 
> enclosed by \url{} have the problem (in the PDF file) of empty boxes 
> remaining quoted in the URL, e.g.,
> 
> http://www-ccrma.stanford.edu/~{}serafin/320/lab1/Introduction_matlab.html

Yes; the \~{} trick is really only useful for a quick-and-dirty visual
effect on paper. LaTeX2HTML handles it OK, due to the flexibility of Perl.
However it is logically incorrect, and causes problems for active URLs
when processed by LaTeX or pdfTeX.

> 
> As we all know, '\~' must be entered as '\~{}' to avoid being interpreted 
> as calling for a tilde over the next character when processed by TeX.

No.  \href  will interpret the ~ verbatim, since it switches the \catcode,  ...

   ... but you cannot use this (within \href or \url) as part of the expansion
of a macro, unless you have altered the \catcode locally before the macros
is defined.  This is exactly what  \urldef  does.


> TeX.  (Note that this problem persists with either ordering of hyperref.sty 
> and
> html.sty.)  I presently have no workaround for this problem using the new 
> ordering of the packages.


 
> In summary, my only workaround at present (which seems complete, as far as 
> I have tested) is to use the package order
> 
>   \usepackage{html}
>   \usepackage{hyperref}
> 
> as requested in the hyperref installation notes, and to quote all URLs 
> using \url{} rather than trying to quote '~' "in the open", which fails 
> because '\~{}' translates to '~{}' in the PDF URLs.

I must have written the Perl coding to gobble up \url in case it occurs
at the beginning of a <URL>. Otherwise this wouldn't work for HTML.

 
> > > 
> > \htmladdnormallinkfoot{description}{\url{http://www.somewhere.org/~name/file_name.html}}
> >
> >You don't need the  \url ; indeed it is just plain wrong,
> >so I'm surprised to here you say that it works.
> 
> That's too bad, because presently it is the only thing that does work in 
> all contexts that I've seen so far.  I will continue my testing and report 
> any problems I discover.

It's a hard problem because of:
  1.  need to change \catcode s
  2.  different order of parameters
  3.  a desire to write macros expanding to URLs
as explained above.

 
> It appears the "right" solution is for '{}' to be stripped out of URLs in 
> the PDF.  I don't think it can work to simply define \~ (in hyperref) as 
> having an argument (which would gobble up the '{}'), because then regular 
> tildes would not work (unless the definition was somehow confined to URL 
> processing).
 
> By the way, the newest version of hyperref, does seem to fix the '\_' 
> problem in all contexts that I've tested, so that's some real progress --- 
> thanks to Heiko Oberdiek for that.  Perhaps he will have an idea about how 
> to get rid of {} in URLs?

Perhaps.

Hope this helps,

	Ross Moore



> Julius
>