[accessibility] Current packages and methods of generating tagged PDF from LaTeX

Sun Jun 30 23:52:20 CEST 2019

Hi Jason and Ulrike,

On 1 Jul 2019, at 7:23 am, Ulrike Fischer <fischer at troubleshooting-tex.de<mailto:fischer at troubleshooting-tex.de>> wrote:

As a naïve user of the tagpdf package, here are some initial comments.

The acmart class uses the Libertine fonts. I noticed in the output
of Adobe Reader that letters occurring in ligatures were omitted,
suggesting that there was an issue with the mapping of those glyphs to the correct Unicode sequences.

You may have noticed that some of the example documents on my site,
(link sent in an earlier email) include ACM journal samples.

So I’ve faced these font-related issues, and many others, already.

There are many aspects of Accessibility that are not related to having Tagged PDF.
Correct mapping of fonts is just one of them.
Having explicit interword spaces is another. Setting the document title, is another.
There are many more.

Also, putting the accent *after* the base character is a definite issue with text-extraction,
when the font doesn’t support the fully composed accented character.
And there can be other font-related issues here.

Most, if not all, the non-tagging related issues are dealt with in the  pdfx.sty  package,
as these are issues for the PDF/A standards, as well as for accessibility.

I would advise you to start all test documents by loading this package, and setup useful,
meaningful and quite detailed Metadata, for the XMP packet in the PDF.
Also, specify which PDF standard you want your document to satisfy.
Some of my examples have LaTeX source which shows you how to do this.

Even if you don’t really need standards compliance, it will mean that you can naturally run Acrobat Pro’s validators,
to get meaningful feedback on the issues that remain within your newly created PDF.
I’m sure you are already familiar with those tests, but maybe not so with LaTeX-generated PDFs.

Unfortunately, I can't share the document yet, due to copyright
considerations, but that situation may change.

Yes.
Most users do not want to go through adjusting their source for tagging each paragraph, list item, heading, etc.
And this is impossible for things generated automatically, like Bibliograpy, Tables of Contents, etc.

It should all just happen automatically.
And eventually it will.

That is the approach that I take: top-down rather than bottom-up.
Of course you need a firm code base for this to work,
and there are many places where things can go a little bit wrong.
Often it’s not until you do the validation with Acrobat Pro that errors become apparent.
That’s why I don’t share my basic coding, until I’m fully satisfied an author can and will do such testing,
and can properly interpret the error messages that inevitably result.

Eventually Ulrike’s coding and mine will need to be merged.
In the meantime, I’m happy to look at your example whenever you are willing to share it,
as it may reveal places where more work needs to be done.

In my experience, every document that I’ve used for testing has revealed
some new detail that needs proper support.

Here too: if you see problems add an issue to the github tracker. You
don't need to share confidential documents. Simple replace the text by
dummy text.

Ulrike Fischer

Hope this helps.

Ross

Dr Ross Moore
Department of Mathematics and Statistics
12 Wally’s Walk, Level 7, Room 734
Macquarie University, NSW 2109, Australia
T: +61 2 9850 8955  |  F: +61 2 9850 8114
M:+61 407 288 255  |  E: ross.moore at mq.edu.au<mailto:ross.moore at mq.edu.au>
http://www.maths.mq.edu.au
[cid:image001.png at 01D030BE.D37A46F0]
CRICOS Provider Number 00002J. Think before you print.
Please consider the environment before printing this email.

This message is intended for the addressee named and may
contain confidential information. If you are not the intended
recipient, please delete it and notify the sender. Views expressed
in this message are those of the individual sender, and are not
necessarily the views of Macquarie University. <http://mq.edu.au/>
<http://mq.edu.au/>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://tug.org/pipermail/accessibility/attachments/20190630/416d195e/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 4605 bytes
Desc: image001.png
URL: <https://tug.org/pipermail/accessibility/attachments/20190630/416d195e/attachment-0001.png>