[OS X TeX] Problem with TeX processing of SGML index

Jeremy Malcolm Jeremy at Malcolm.id.au
Wed Mar 8 02:33:32 CET 2006


I am using LyX and the LyX to DocBook conversion scripts at
http://www.karakas-online.de/mySGML.  These are designed to allow you to
write in LyX, and then via LyX's export to DocBook SGML, to convert into
various other formats such as HTML, PDF, etc.  (Yes, I know that I could
just convert direct from LyX/LaTeX to those formats.)

My particular issue is with the PDF conversion which goes from DocBook
SGML (back!) to TeX and thence to PDF using OpenJade.  Possibly we are
straying into off-topic territory here, but since I am using a Mac to do
this, and it is a TeX problem as far as I can tell, I thought I would
try this list anyway.

My problem lies in index creation.  Index creation with DocBook is done
using the following steps in a nutshell:

* First, insert <indexterm> entries in your SGML (or XML) document.
    An index entry is only placed where there is an <indexterm>, so if
    you want the same word indexed whenever it occurs you have to
    manually insert <indexterm>s at each occurence.

* Then create a blank index.sgml file using a Perl script included in
    the distribution which is called collateindex.pl.

* Then use jade or somesuch (I use OpenJade) to generate an HTML.index
    file containing code that contains page number references for each
    of the indexed terms.  Or actually, they are not page numbers but AEN
    (all-element-numbers), which is the count of the SGML elements that
    preceded that point.

* That HTML.index file is then used as input to a subsequent iteration
    of collateindex.pl which actually generates the index.sgml file.  It
    is referenced as an external entity in your main source file, and you
    use something like OpenJade to generate the formats you need (HTML,
    TeX, PostScript, PDF...).

The problem I am experiencing is that, in a large index, maybe
three-quarters of my entries don't work in the PDF output.  In the
places where it doesn't work, I either get ? (most common, no
hyperlink), or ?? (rarely, hyperlinked to page 1).  In the places where
it does work, the hyperlink to the page is the wrong number.

For example auDA appears in my document and is indexed eight times, but
only comes up twice: "to auDA, but retained control" on page 67 (indexed
as 22) and "control over the au ccTLD to auDA" on page 105 (indexed as 57).

Why, you are wondering, do I consider this to be a TeX problem?  Because
it only happens with the PDF (via TeX) format output, it does not happen
with the HTML output, which is perfect (and which doesn't use TeX).

Sorry if this is off-topic, but I'm running out of options here as I've
tried almost everything (eg. paring back my DSSSL driver file, running a
much smaller test document through the index - which does, actually,
work).  Would it help for me to send some TeX or SGML source?  Could
this be a platform issue?  Sorry if I'm being vague.

TIA

PS. This mailing list is buggy.  It doesn't allow digitally signed
     messages.

-- 
Jeremy Malcolm LLB (Hons) B Com
Internet and Open Source lawyer, IT consultant, actor
host -t NAPTR 1.0.8.0.3.1.2.9.8.1.6.e164.org|awk -F! '{print $3}'
------------------------- Info --------------------------
Mac-TeX Website: http://www.esm.psu.edu/mac-tex/
          & FAQ: http://latex.yauh.de/faq/
TeX FAQ: http://www.tex.ac.uk/faq
List Archive: http://tug.org/pipermail/macostex-archives/




More information about the macostex-archives mailing list