[accessibility] Current packages and methods of generating tagged PDF from LaTeX

Thu Jun 27 23:48:25 CEST 2019

Hi Michal, Jason and the LaTeX team.

On 28 Jun 2019, at 12:02 am, Michal Hoftich <michal.h21 at gmail.com<mailto:michal.h21 at gmail.com>> wrote:

> I doubt it, perhaps some parts can be reused but pdf has really rather
> special requirements. Beside this tex4ht is imho quite stable, mature and
> powerful.

The problem with the the PDF syntax is that it seems to be quite
low-level, basically HTML, so it captures lot less information than
what tex4ht already does and it wouldn't work for other output formats
than HTML (ODT, DocBook,..).

Actually, I do not agree with this statement.

PDF provides mechanisms for:

  1.  using any tag-name that the PDF-writer wishes to use,
       provided it is mapped to a standard PDF type, which could be “NonStruct”.
       (This is the purpose of  /RoleMap .)

  2.  associating a particular (or multiple) output-format(s) with any instance
       of a structural tag;    (This is the purpose of  /ClassMap .)

 3.  associating attributes to such instances, that are intended to be included
      upon export into a document for each output format.
       ( one assigns a /C dictionary, with a sub-dictionaries for the elements
         of each desired Class )

Whether software yet exists to output from PDF into formats such as ODT and DocBook
is a moot point. But certainly the possibility exists for this to happen.
And documents can be created now, that should export successfully when such software
is developed in future.

These mechanisms are used by Adobe’s Acrobat upon export into XML, HTML, RTF, etc.

In short, there are already means to capture as much information as you desire.
It is simply a matter of knowing how to do it, and automating it as appropriate.

If there was a higher level structural tagging standard provided by
the LaTeX kernel and maintained packages, I think we could reuse that
information in tex4ht. For example something like dpub-aria:
https://w3c.github.io/dpub-aria/#doc-chapter<https://protect-au.mimecast.com/s/X5B3C1WLjwsqVrZVHGb01g?domain=w3c.github.io>.
It shouldn't be hard to map such higher level information to the PDF
tagging, HTML 5, ODT and other formats.

I attach some pictures here, to show the effect of the RoleMap,
from a document that I’m currently working on.

The first image uses custom tags as  <lower-case-name> , each with a title that indicate
the purpose of its child content.

[cid:500E7BAA-AB7A-46A8-A476-409F6880241B at telstra.com.au]

The 2nd image shows how this structure is Role-Mapped to standard PDF tag types,
to satisfy the PDF/UA-1 standard.

[cid:24C46D04-0A21-47DA-B9B1-1FD3CE8DCE10 at telstra.com.au]

Best regards,
Michal

Jason,   — another Aussie, originally from Melbourne, same as me !
   Great to meet you.

Earlier you said:

I can also test the output with screen readers and PDF reading tools on each
of those operating systems. Adobe Reader under Windows and Mac OS supports
tagged PDF; work was done to support it in Evince under Linux/GNOME, but I
don't know how far that has progressed; Preview under Mac OS is said to have
support as well. (Unfortunately, most of the PDF files I receive aren't
tagged, so I have performed very little testing across operating systems.)

I have a quite large set of Tagged PDF documents, produced using LaTeX and my own macros,
at my site:       http://web.science.mq.edu.au/~ross/TaggedPDF/

I’m particularly interested in receiving feedback about the ones having a fair amount of
mathematical/scientific content;
e.g. in the sub-sections:
    Mathematics Tutorial exercises
    ACM journal styles
    Annals of Mathematics, example document

from the point-of-view of Accessibility, and documents valid for PDF/UA.

Please read these using Assistive Technology, and tell me what works well
and/or what does not. In such feedback please say what AT was being used;
JAWS, NVDA or both, other screen-reading software, and/or Braille keyboard.
Use different combinations with the same document, so that comparisons
can be established and documented.

Also, I’m intrigued by your statement that "Preview under Mac OS” has support
for Tagged PDF. I do all my work on a MacBook, and have never seen that.
Can you provide a link to some descriptive document?

Apple’s VoiceOver does a really bad job with my PDFs, even though they
are fully validating for the PDF/A and PDF/UA standards.
Apple has previously been rather poor at supporting things in the PDF standards
that they do not use in their own software.
It would be really great if that attitude is changing.

All the best.

Ross

Dr Ross Moore
Department of Mathematics and Statistics
12 Wally’s Walk, Level 7, Room 734
Macquarie University, NSW 2109, Australia
T: +61 2 9850 8955  |  F: +61 2 9850 8114
M:+61 407 288 255  |  E: ross.moore at mq.edu.au<mailto:ross.moore at mq.edu.au>
http://www.maths.mq.edu.au
[cid:image001.png at 01D030BE.D37A46F0]
CRICOS Provider Number 00002J. Think before you print.
Please consider the environment before printing this email.

This message is intended for the addressee named and may
contain confidential information. If you are not the intended
recipient, please delete it and notify the sender. Views expressed
in this message are those of the individual sender, and are not
necessarily the views of Macquarie University. <http://mq.edu.au/>
<http://mq.edu.au/>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://tug.org/pipermail/accessibility/attachments/20190627/be02bb3c/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Screen Shot 2019-06-28 at 7.14.55 am.png
Type: image/png
Size: 630926 bytes
Desc: Screen Shot 2019-06-28 at 7.14.55 am.png
URL: <https://tug.org/pipermail/accessibility/attachments/20190627/be02bb3c/attachment-0003.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Screen Shot 2019-06-28 at 7.15.19 am.png
Type: image/png
Size: 629706 bytes
Desc: Screen Shot 2019-06-28 at 7.15.19 am.png
URL: <https://tug.org/pipermail/accessibility/attachments/20190627/be02bb3c/attachment-0004.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 4605 bytes
Desc: image001.png
URL: <https://tug.org/pipermail/accessibility/attachments/20190627/be02bb3c/attachment-0005.png>