[accessibility] Some questions about tagged PDF

Hi Jonathan,

On 12 Dec 2016, at 00:53, Jonathan Fine <jfine2358 at gmail.com<mailto:jfine2358 at gmail.com>> wrote:

Hi Ross

Good to talk with you again.  We wrote:

2. Could a suitable tool create a useful HTML or XML document from a tagged PDF?


3. It there already such a tool?

Yes. Adobe's Acrobat Pro does this already.
It also exports into RTF and Word formats.
So Tagged PDF provides a good solution for submitting TeX PDFs to a journal that only accepts manuscripts done in M$ Word.

This is interesting. If we can produce (good enough) tagged PDF, we
can from this also produce (good enough) HTML, XML and Word documents.
And I believe that from (good enough) XML we ought to be able to
produce (good enough) tagged PDF.

So we are, in part, also talking about round-tripping typesetting, and
LaTeX to XML.

Yes; but really only “kind of”.

XML is really just a Meta-format rather than a format in itself.
It depends upon just what kind of information you want to have within the XML file.

PDF has a concept of attribute “/O-wner”, which seems to govern where this
information can be exported.

Attributes with owner /Layout  seem to be exported to HTML, but not to XML-1.00 .

I need to do more exploration into this, as I continue to support more and more LaTeX
environments, for Tagged PDF.


Hope this helps


