[accessibility] Current packages and methods of generating tagged PDF from LaTeX

Ulrike Fischer fischer at troubleshooting-tex.de
Fri Jun 28 09:18:02 CEST 2019


>> I doubt it, perhaps some parts can be reused but pdf has really rather
>> special requirements. Beside this tex4ht is imho quite stable, mature and
>> powerful.

> The problem with the the PDF syntax is that it seems to be quite
> low-level, basically HTML, so it captures lot less information than
> what tex4ht already does and it wouldn't work for other output formats
> than HTML (ODT, DocBook,..).

Well one can store a lot of info in a pdf, much more than simple html.
But my remark wasn't about some pdf->html conversion.

> If there was a higher level structural tagging standard provided by
> the LaTeX kernel and maintained packages, I think we could reuse that
> information in tex4ht. For example something like dpub-aria:
> https://w3c.github.io/dpub-aria/#doc-chapter.
> It shouldn't be hard to map such higher level information to the PDF
> tagging, HTML 5, ODT and other formats.

Yes, we certainly should/will improve the standard latex macros to
allow easier access to structure data. And tex4ht (and lwarp) can
benefit from it (that's what I meant with "some parts" above).

But pdf-tagging is not only getting the structure: the main bulk of
the code handles quite pdf-specific stuff like marking the page
streams, building dictionaries and object references, mapping the
structure elements in the recommended way
(https://www.pdfa.org/resource/tagged-pdf-best-practice-guide-syntax/)  and similar.

And in the same way export to html is not only getting the structure.
The main bulk here is getting the correct text representation (all
these htf-font files ...) and configuring the output (css, html code
etc).


Ulrike



More information about the accessibility mailing list