[tex-eplain] Hyperlinks in index, take two

geolsoft at mail.ru geolsoft at mail.ru
Mon Aug 1 15:49:36 CEST 2005


Hello,

I was informed that a list member had a problem with my last
post (some attachment was scrubbed).  I myself have not yet
received my own message back, which I always do (and I
posted it over 18 hours ago).  Something funny is going on,
so I decided to just repost the message, with attachments in
a single .tar.gz file this time.  Sorry for the
inconvenience.

Below is my original message.

----- Forwarded message from geolsoft -----

Date: Sun, 31 Jul 2005 22:44:04 +0300
To: tex-eplain at tug.org
Subject: Hyperlinks in index

Hello,

I send the patch enabling hyperlinks in index.  Brace
yourself, this post is a long one.

There are actually two approaches.

==========

As per the first approach, each indexing command (like \idx,
\sidx, \idxname, etc.) defines a unique hyperlink
destination and passes its name on to .idx file as part of
index term.  This has to be done because we have to be able
to trace the entries which MakeIndex generates in .ind file
back to the place where this entry was defined.  Thus, for
each index entry except see, seealso and range end entries,
instead of writing to the .idx file a line like this:

  \indexentry{ENTRY|PAGEMARKUP}{PAGENO}

we write a line with a hyperlink wrapper embedded behind the
scenes:

  \indexentry{ENTRY|idxhl{HLDESTNAME}{PAGEMARKUP}}{PAGENO}

Here PAGEMARKUP is the page markup command name the user
specified (may be empty), \idxhl is a wrapper which turns
PAGENO into hyperlink, and HLDESTNAME is the unique
hyperlink label we generate.  \idxhl macro will output
\PAGEMARKUP{PAGENO}, wrapping it up into a hyperlink
pointing to HLDESTNAME destination.

The problem with this approach, naturally, is that now
MakeIndex will regard _all_ index entries as distinct,
because each one will contain a (unique) hyperlink
destination name we write out to .idx file.  This can be
partially cured by the attached idxuniq.awk script, which
filters out \indexentry lines differing only in HLDESTNAME
but identical otherwise.  This is still not ideal, as
page-range formation ability of MakeIndex will not work, and
there will be problems of apparently identical index entries
clashing (e.g., when a range-end entry appears on the same
page as another entry with the same definition; idxuniq.awk
script will not filter out the second entry).

==========

The second approach does not write out any destination
names.  Instead, we write a pageencap wrapper which parses
the page number and generates a link pointing to the _page_
on which the term appeared.  The idea is to write an
\indexentry line like this:

  \indexentry{ENTRY|idxhlpage{PAGEMARKUP}}{PAGENO}

where idxhlpage{PAGEMARKUP} is what we substitute for
PAGEMARKUP behind the scenes.

With this approach, all features of MakeIndex are intact.
However, this approach depends on page range and page list
separators which MakeIndex was configured to output.
Therefore I provide two macros:

  \setidxpagelistdelimiter
  \setidxpagerangedelimiter

which take one parameter, the delimiter, and set up eplain
so that it correctly parses PAGENO.  We need to know the
list delimiter in addition to the range delimiter because
MakeIndex outputs PAGENO to be one of the following:

  1     (single page number)
  1--5  (page range)
  1, 2  (list of two consecutive page numbers)

Therefore, we have to be able to parse both page ranges and
two-page lists to get the page numbers straight.
Unfortunately, as a result of parsing and splitting the
two-page lists, this approach is not fully compatible with
pre-hyperlink behavior.  Because we have to wrap up each
page number from the two-page list into a separate
hyperlink, we have to split the list, so that the result
becomes something like this:

  \hlstart \PAGEENCAP{1}\hlend, \hlstart \PAGEENCAP{2}\hlend

Note that before (or now, when not using hyperlinks), you
would get this instead:

  \PAGEENCAP{1, 2}

The difference is that `, ' used to be typeset inside the
parameter to \PAGEENCAP, but now it is outside of
\PAGEENCAP.

I am not sure what was the reason for MakeIndex to combine
two consecutive page numbers under single \PAGEENCAP
(efficiency?), but IMHO it was a bad idea, as it can result
in the following ugliness:

  \PAGEENCAP{1, 2}, \PAGEENCAP{5}

Imagine \PAGEENCAP makes its argument italic.  Here, you get
the first comma in italic, and the second comma in roman.
Even worth happens if you want \PAGEENCAP to underline page
numbers.  So the splitting of the list actually `fixes' such
situations, but it makes eplain incompatible with how it
operated before.

==========

To summarize, these are advantages (+) and disadvantages (-)
of the two approaches:

Approach 1

  + Hyperlinks point to exact locations of index terms.
    This seems to be quite a nice feature (I hate to
    sometimes spend half a minute searching for an index
    term on a crowded page).

  - Much of the useful functionality of MakeIndex cannot be
    employed (like implicit page range formation).

Approach 2

  - Hyperlinks point to a page on which an index term is
    defined, not to exact location of the term.  The reader
    will have to search through the page to locate the term.

  - When MakeIndex is configured to use non-standard page
    list and page range delimiters, eplain also has to be
    informed of the configuration.

  - A little incompatibility with (improvement on?) `old'
    eplain.

  + All functionality of MakeIndex works fine with this
    approach.

==========

Since neither approach will be 100%-compatible with previous
releases, I decided to do the following.  If hyperlinks are
not enabled, everything works just like in `old' eplain.
If hyperlinks are enabled, the second approach is the
default.

The method currently used to select the various schemes is
quite crude--you have to define \idxhldestplace to 0 (second
approach), 1 (first approach) or a negative number (no links
in index).  If both approaches will be accepted for
inclusion in eplain, maybe I should provide a better
mechanism.  For instance, the scheme for index hyperlinks
could be selected by \enablehyperlinks.  It could take a
second optional parameter to be one of `none', `page' or
`exact'.  Or maybe we could have only one optional parameter
to act as a list of options, with `idx=none', `idx=page',
`idx=exact' selecting `no links', `page links' or `exact
links' for the index, and assume that an option without a
`=' is a hyperlink driver name.  I don't think we should
provide a way to switch between the two approaches--the
appropriate timing for the switching might be tricky because
of the way the destinations are defined, and besides I don't
think anyone will want to switch back and forth.

==========

A few additional notes:

 - For efficiency, it is important to have \idxhlpage parse
   the more frequent case (list / range) first.  However,
   different types of books seem to provide different
   statistics.  In most manuals and math books, two-page
   lists appear to be prevailing.  But in more `relaxed'
   publications (like The TeXbook), ranges are more
   frequent.  Currently, \idxhlpage parses lists first.

 - Since we define the list and range parsing macros for
   internal use, might as well let the user access them.
   They actually may be quite useful, see \ituline
   encapsulator in the test file.

 - There is no `automagic' hyperlink support for see and
   seealso entries, as there is not enough information to
   relate the parameters of \indexsee and \indexseealso to
   corresponding index entries.  But if much desired, this
   can be implemented `manually', see the test file for an
   example.

 - To the maintainer:  On 31 Mar 2005, I submitted a patch
   under thread `Patch to add encapsulation support with
   index range terms'.  It is not yet applied, but I needed
   this functionality, so the patch I send now covers the
   old patch.  So disregard the old patch, except for the
   documentation changes.  Anyway, I can update the
   documentation when documenting hyperlinks, so forget it
   altogether.

==========

That was some post.  Hope it makes sense.

-- 
Best regards,
Oleg Katsitadze

----- End forwarded message -----
-------------- next part --------------
A non-text attachment was scrubbed...
Name: hlidx.tar.gz
Type: application/octet-stream
Size: 6093 bytes
Desc: not available
Url : http://tug.org/pipermail/tex-eplain/attachments/20050801/5cf00e47/hlidx.tar-0001.obj


More information about the tex-eplain mailing list