<!DOCTYPE html><html><head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body>
<p>Hi Ross</p>
<p>If I may broaden the discussion a bit...<br>
</p>
<div class="moz-cite-prefix">On 11/05/2024 01:14, Ross Moore wrote:</div>
<blockquote type="cite" cite="mid:F1F76E21-04CE-41AF-BF9D-71636556372C@mq.edu.au">
<div><br>
<blockquote type="cite">
<div>On 10 May 2024, at 7:30 pm, Alex Watson <a class="moz-txt-link-rfc2396E" href="mailto:alexander.watson@ucl.ac.uk"><alexander.watson@ucl.ac.uk></a>
wrote:</div>
<br>
</blockquote>
<blockquote type="cite">
<div>
<div>
<p>- but it was not accepted because the maintainer (very
reasonably) did not want to introduce custom markup
without an accepted practice that would align with
future tagged PDF practices etc. that Ulrike et al are
developing.<br>
</p>
</div>
</div>
</blockquote>
<div>… since without a fully tagged PDF structure tree, there
isn’t any other way to tell when you are in a table cell,
header or otherwise, or even the table itself.</div>
<div>It has to be done by whatever software is interpreting the
LaTeX source.</div>
<div><br>
</div>
<div>Although not a complete solution, most tables will be of
the type where the 1st cell in a row is <TH></div>
<div>and the first non-compounded (using \multicolumn) cell in a
column is similarly a <TH>.</div>
<div>Using Booktabs, so that \midrule can be the boundary
between <THead> and <TBody>, is certainly a good
idea.</div>
<div><br>
</div>
<div>Using these ideas, call them heuristics if you like, you
can get a long way into producing fully tagged tables,</div>
<div>whether for Tagged PDF or for HTML.</div>
<div><br>
</div>
<div>I gave a talk on precisely this topic at TUG 2022:</div>
<a href="https://www.youtube.com/watch?v=E1oFa3DbyoE&list=PLLt9mKFAx-FaKzET1DNj-wD-g8YG3_r1m&index=17" originalsrc="https://www.youtube.com/watch?v=E1oFa3DbyoE&list=PLLt9mKFAx-FaKzET1DNj-wD-g8YG3_r1m&index=17" shash="TMfgf4PrOImsvme1RDmLkwvjcWrRuf/zK2F11wJodODmpMDSkB9uJ4bHb3qkyIIARPQH6b3PaeFBy8kdUJP6rZwk+uqPm1q7YL0qfk+dHTxHqsQEOG270/XyCkG+mllXGfJegYOyATm98LfkxsiFgYN41VHEJmC+1m89KJjifsY=" moz-do-not-send="true">https://www.youtube.com/watch?v=E1oFa3DbyoE&list=PLLt9mKFAx-FaKzET1DNj-wD-g8YG3_r1m&index=17</a>
<div><br>
</div>
<div>Links to example PDFs, and conversions to HTML, can be
found at:</div>
<div><a href="http://web.science.mq.edu.au/~ross/TaggedPDF/TUG2022/" originalsrc="http://web.science.mq.edu.au/~ross/TaggedPDF/TUG2022/" shash="vd3BfQwYx44q/UmYVwmjyq+8FR0Gh5bPa/r+OU5vwaU6MSK+wHkak/zGAYfnXK+TTAZ59gH2GcW+Ut6LcDhxKarxqZWIzEeV31xXxxRpYpQ+zxD5kf1nVtxeUWiHSjS7J7bphy49LRF+nsxtVdsL8l2QzeOoqcNuRTtYKmggNi4=" moz-do-not-send="true">http://web.science.mq.edu.au/~ross/TaggedPDF/TUG2022/</a></div>
</div>
</blockquote>
<p>I saw your talk when it came up in 2022, and the mechanism for
this is very interesting, but I think (and I wonder if you agree)
that some kind of interface will eventually be required, no matter
how clever the heuristics.</p>
<p>For example, it is very hard to distinguish between tables where
both the first row and first column contain headers (as in all the
tables in your 'real world' examples) and tables where only the
first row contains headers. While there might be additional
heuristics (e.g. a boldface first column), there will always be a
lot of potential for ambiguity.</p>
<p>Given that progress on tagged PDF is likely to be ongoing for
several years, it would be nice to have some kind of interface for
document authors to indicate table heading cells. If there was a
stable interface, maintainers of HTML translation packages
(tex4ht, lwarp, latexml etc.) could implement this as a core part
of their system, rather than relying on ad hoc solutions like the
custom config.cfg that Michal offered in this thread. Until this
happens, most tex-generated HTML in the wild will simply not have
accessible tables.<br>
</p>
<p>Have the tagged PDF team given any thought as to what this
interface (or a minimal functional subset of it) might look like,
and whether it could be made public in advance of the
corresponding work on tagging?</p>
<p>For instance, a very minimal solution might be as I suggested:
provide a macro to explicitly tag header cells, and if it appears
in a table, then abandon the heuristics and just follow the
author's explicit requests. This would work just for HTML
production (but I don't know if it would be sufficient for
generating a valid tagged pdf table). A more advanced interface
might include explicitly selecting a heuristic, etc. etc.</p>
<p>Best wishes,</p>
<p>Alex<br>
</p>
</body>
</html>