[tex4ht] progress on fixing links in book, and via that, some information towards a diagnosis of the problem
Michal Hoftich
michal.h21 at gmail.com
Tue Mar 11 16:03:38 CET 2025
Hi Chet,
Is your problem fixed now with the new build file? The IDs can be wrong
during the first several compilation runs, but I expect that they should be
good now.
Best regards,
Michal
On Fri, Mar 7, 2025 at 4:21 PM Chet Murthy <murthy.chet at gmail.com> wrote:
>
> I've been fighting a problem that basically anything I put in a document,
> howsoever tiny, causes TOC, \ref->\abel and \cite->\biblio to all break.
> And by that I mean, figures/labels, listings, for sure. But even just
> index entries.
>
> TL;DR I wrote a tool to analyze and fix broken links in the generated
> EPUB, and found that almost all (that is to say, all but ONE -- nearly 500)
> were fixable by doing some minor matching-up of busted hrefs with ids using
> simple pattern-matching, as described below. The fixed EPUB worked fine,
> and I was able to verify that the fixed links were correct (hence, not just
> stitching-together random hrefs->ids in the book).
>
> ===========================
>
> I had a brain-fart last night, and wrote a tool to implement it. I have
> been noticing that the fragment-ids look like
>
> xNNNN-MMMMM
>
> (where the "MMMMM can contain dots and maybe some letters, but rarely so).
>
> And I noticed a few things:
>
> (1) [multi-file problem] sometimes an href in file F1 looks like "#x1-32"
> when there is no such id in file F1, but there IS such an href in file F2.
> One might call this an unfortunate artifact of the generation of multiple
> files: if only a single HTML file were generated, this wouldn't even be a
> problem.
>
> (2) [wrong prefix problem] sometimes {a real example) an href looks like
> "#x7-15001", and there is no such ID anywhere in the files. But there IS
> an ID "#x8-15001".
>
> (3) and sometimes, more than one ID in the files is identical, viz.
>
> mainch1.html#x6-80001
> mainch2.html#x8-80001
>
> So I wrote a tool that found all the hrefs and IDs, and "did the math"
> that is implied by the above thinking, and found that in my book there was:
>
> * ONE instance of #3
> * THREE instances of #1
> * 446 instances of #2
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://tug.org/pipermail/tex4ht/attachments/20250311/abfd5c0b/attachment.htm>
More information about the tex4ht
mailing list.