<!DOCTYPE html><html><head>

<meta http-equiv="Content-Type" content="text/html; charset=utf-8">

  </head>

  <body>

    <p>Hi Ross</p>

    <p>If I may broaden the discussion a bit...<br>

    </p>

    <div class="moz-cite-prefix">On 11/05/2024 01:14, Ross Moore wrote:</div>

    <blockquote type="cite" cite="mid:F1F76E21-04CE-41AF-BF9D-71636556372C@mq.edu.au">

      <div><br>

        <blockquote type="cite">

          <div>On 10 May 2024, at 7:30 pm, Alex Watson <a class="moz-txt-link-rfc2396E" href="mailto:alexander.watson@ucl.ac.uk"><alexander.watson@ucl.ac.uk></a>

            wrote:</div>

          <br>

        </blockquote>

        <blockquote type="cite">

          <div>

            <div>

              <p>- but it was not accepted because the maintainer (very

                reasonably) did not want to introduce custom markup

                without an accepted practice that would align with

                future tagged PDF practices etc. that Ulrike et al are

                developing.<br>

              </p>

            </div>

          </div>

        </blockquote>

        <div>… since without a fully tagged PDF structure tree, there

          isn’t any other way to tell when you are in a table cell,

          header or otherwise, or even the table itself.</div>

        <div>It has to be done by whatever software is interpreting the

          LaTeX source.</div>

        <div><br>

        </div>

        <div>Although not a complete solution, most tables will be of

          the type where the 1st cell in a row is <TH></div>

        <div>and the first non-compounded (using \multicolumn) cell in a

          column is similarly a <TH>.</div>

        <div>Using Booktabs, so that \midrule can be the boundary

          between <THead> and <TBody>, is certainly a good

          idea.</div>

        <div><br>

        </div>

        <div>Using these ideas, call them heuristics if you like, you

          can get a long way into producing fully tagged tables,</div>

        <div>whether for Tagged PDF or for HTML.</div>

        <div><br>

        </div>

        <div>I gave a talk on precisely this topic at TUG 2022:</div>

        <a href="https://www.youtube.com/watch?v=E1oFa3DbyoE&list=PLLt9mKFAx-FaKzET1DNj-wD-g8YG3_r1m&index=17" originalsrc="https://www.youtube.com/watch?v=E1oFa3DbyoE&list=PLLt9mKFAx-FaKzET1DNj-wD-g8YG3_r1m&index=17" shash="TMfgf4PrOImsvme1RDmLkwvjcWrRuf/zK2F11wJodODmpMDSkB9uJ4bHb3qkyIIARPQH6b3PaeFBy8kdUJP6rZwk+uqPm1q7YL0qfk+dHTxHqsQEOG270/XyCkG+mllXGfJegYOyATm98LfkxsiFgYN41VHEJmC+1m89KJjifsY=" moz-do-not-send="true">https://www.youtube.com/watch?v=E1oFa3DbyoE&list=PLLt9mKFAx-FaKzET1DNj-wD-g8YG3_r1m&index=17</a>

        <div><br>

        </div>

        <div>Links to example PDFs, and conversions to HTML, can be

          found at:</div>

        <div><a href="http://web.science.mq.edu.au/~ross/TaggedPDF/TUG2022/" originalsrc="http://web.science.mq.edu.au/~ross/TaggedPDF/TUG2022/" shash="vd3BfQwYx44q/UmYVwmjyq+8FR0Gh5bPa/r+OU5vwaU6MSK+wHkak/zGAYfnXK+TTAZ59gH2GcW+Ut6LcDhxKarxqZWIzEeV31xXxxRpYpQ+zxD5kf1nVtxeUWiHSjS7J7bphy49LRF+nsxtVdsL8l2QzeOoqcNuRTtYKmggNi4=" moz-do-not-send="true">http://web.science.mq.edu.au/~ross/TaggedPDF/TUG2022/</a></div>

      </div>

    </blockquote>

    <p>I saw your talk when it came up in 2022, and the mechanism for

      this is very interesting, but I think (and I wonder if you agree)

      that some kind of interface will eventually be required, no matter

      how clever the heuristics.</p>

    <p>For example, it is very hard to distinguish between tables where

      both the first row and first column contain headers (as in all the

      tables in your 'real world' examples) and tables where only the

      first row contains headers. While there might be additional

      heuristics (e.g. a boldface first column), there will always be a

      lot of potential for ambiguity.</p>

    <p>Given that progress on tagged PDF is likely to be ongoing for

      several years, it would be nice to have some kind of interface for

      document authors to indicate table heading cells. If there was a

      stable interface, maintainers of HTML translation packages

      (tex4ht, lwarp, latexml etc.) could implement this as a core part

      of their system, rather than relying on ad hoc solutions like the

      custom config.cfg that Michal offered in this thread. Until this

      happens, most tex-generated HTML in the wild will simply not have

      accessible tables.<br>

    </p>

    <p>Have the tagged PDF team given any thought as to what this

      interface (or a minimal functional subset of it) might look like,

      and whether it could be made public in advance of the

      corresponding work on tagging?</p>

    <p>For instance, a very minimal solution might be as I suggested:

      provide a macro to explicitly tag header cells, and if it appears

      in a table, then abandon the heuristics and just follow the

      author's explicit requests. This would work just for HTML

      production (but I don't know if it would be sufficient for

      generating a valid tagged pdf table). A more advanced interface

      might include explicitly selecting a heuristic, etc. etc.</p>

    <p>Best wishes,</p>

    <p>Alex<br>

    </p>

  </body>

</html>