[accessibility] Current packages and methods of generating tagged PDF from LaTeX

easjolly at ix.netcom.com easjolly at ix.netcom.com
Fri Jun 28 05:59:42 CEST 2019

I don’t know very much about the technical aspects of this specific problem but I would like to present some of my thoughts that might lead to additional discussion.


My experience in dealing with creating and interconverting file formats in other contexts is that the standard advice is to first create a so-called neutral file that may or may not be one of the target formats.  Then each target format is produced from the neutral file. This approach is of course intended to reduce the number of converters needed and to simplify adding new target formats at a later time.


I understand that here we are addressing the issue that a large per cent of documents are currently authored in some flavor of LaTeX and that this situation is unlikely to change. And it seems so far that tagged PDF is being considered as the best neutral format.  If that is correct then one question is whether it is really the best option?


One item that got me to thinking about other options is the solution described in the article “Creating PDF documents with accessible formulae” by Ahmetovic, et. al. in TUGboat 39(3) p 224. This article proposes a method for retaining in a created PDF document a hidden copy the LaTeX source associated with each math expression that’s in the document so it is available for generating accessible formulae.


Now that digital storage seems virtually unlimited via the cloud, I wonder if another option for retaining the source would be some protocol for separate storage of the entire LaTeX source used to create a PDF file? This would seem to at the very least have the advantage that it could be done quickly and independently from improving tagging or developing a different neutral file format.  


In the same issue of TUGboat on p. 173 there is an article by Shultz and Koch on file encoding and TEXShop. It points out the need to let LaTeX or other typesetting engine know which file encoding was used. Could something similar be used to tell a renderer where to access/store its source file? Of course the rendered document would also need a copy of the storage location. Could this be part of its metadata?


Best wishes,

Susan Jolly



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://tug.org/pipermail/accessibility/attachments/20190627/bb2d513e/attachment.html>

More information about the accessibility mailing list