[accessibility] Accessible TeX rendering (was Re: TeX Hour: Thu 21 and 28 October: Durable conversion to accessible outputs)

Ross Moore ozross at icloud.com
Sun Oct 31 02:48:09 CEST 2021


Hi Jason, Jonathan and others.

>> From: Jason White via accessibility <accessibility at tug.org>
>> Date: 31 October 2021 at 3:04:56 am AEDT
>> To: accessibility at tug.org
>> Subject: [accessibility] Accessible TeX rendering (was Re: TeX Hour: Thu 21 and 28 October: Durable conversion to accessible outputs)
>> Reply-To: Jason White <jason at jasonjgw.net>
>> 
>> 
>> 
>> On 20/10/21 14:34, Jonathan Fine wrote:
>>> Thu 21 October: 6:30 to 7:30pm: What is an accessible TeX rendering pipeline?
>> That's a good question and an important topic. I have a relatively simple solution at the moment: a makefile that converts my LaTeX source to PDF and HTML.
>> 
>> At present, I am using the Lwarp package to generate the HTML output, and Lualatex for the PDF. However, it would be straightforward to use TEX4HT or LaTeXML instead.
>> 
>> The main problems, from my limited perspective, are as follows.
>> 
>> 1. Not knowing which LaTeX packages are compatible with producing high-quality HTML output, and addressing incompatibilities that do occur. If the HTML conversion were standardized and supported directly within the packages themselves (e.g., with code that specifies structural tags/elements), I think this would be easier from a user's point of view.
>> 

>> I hope the development of support for tagged PDF will have the side effect of improving HTML processing. I care about quality HTML output, but not so much about PDF tagging.
>> 
It most certainly will.

I have long been of the opinion that the best LaTeX to HTML conversion is done by Acrobat Pro’s  Save as HTML, 
from the PDF document view. This works very well for most untagged PDFs, creating well-sized images when necessary.

But there is an even better HTML converter for Tagged PDFs.

Do a Google search for  Next Generation PDF .
I’m having great success incorporating CSS rules directly into the tagged PDF of documents valid for PDF/UA,
so as to improve the layout in HTML without affecting the visual form of the PDF at all.
E.g., (in HTML)
 • split a long list of short items to become 2-column;
 • controlling the size of images;
 • centering of images with and without accompanying captions;
 • floating largish blocks to allow side-by-side presentation;
 • italicise whole paragraphs, when appropriate for special reasons;
 • present ToC-like material apparently as a description-list, when actually an enumerated list;
 • layout a titlepage-like presentation having logo, styling, background and borders;
 • and many more effects.

 ngPDF also comes with an Editor, which allows you to tweak the document structure,
producing a new PDF which you can download.
How well this works I cannot say, as I produce the tagged PDFs myself using LaTeX,
and only make use of the HTML conversion.


>> 2. The fact that there is no standard HTML production pipeline for LaTeX, but rather a multiplicity of tools with similar functionality (but differences of detail in what is supported and what output is produced).
>> 
An intention of PDF/UA is that other structured formats can be exported from the PDF, in a similar way to how  ngPDF  does HTML.
Acrobat Pro already exports to:   XML, RTF, Word, Excel, Powerpoint, HTML 
and various Text formats (having stripped away the structure tagging).

But nothing comes for free: Garbage In ==> Garbage Out certainly applies.
You need to develop a decent structural backbone for your documents, and become familiar with how to encode this,
as well as providing appropriate attributes and class-definitions to get the best results after Export.

>> 3. There is support for providing a text alternative to graphical content in LaTeX now, equivalent to the HTML ALT attribute. However, there doesn't appear to be a standard mechanism for providing extended descriptions (e.g., containing tables, paragraphs or other structural elements).
>> 
This is all possible, and under development.

>> It should also be noted that mathematical notation doesn't have a significant role in my current work, so I am effectively avoiding mathematics accessibility issues by not needing this aspect of LaTeX much. When I wrote my Ph.D. thesis in LaTeX, there were occasional logic symbols and variables in the technical chapters, and in that context, the ability to include the notation correctly (and to edit it in a completely accessible manner) was important.
>> 
>> I have also used Pandoc Markdown and AsciiDoc, but I keep coming back to LaTeX for all of my substantial writing due to the wealth of packages available in TeX Live and the excellent facilities for use in scholarly manuscripts. Anyone with a complete TeX Live installation could work with my documents, whereas, if I wrote them in Markdown, for instance, anyone else who wished to build them would have to install extensions and use specific command line options or makefiles - and I would have to rely on various extensions that may or may not be maintained.
>> 
>> I think LaTeX currently has a more mature software environment than the "light-weight" markup languages provide, at least in relation to my needs.
>> I also appreciate having the typographical details decided by specialists in that domain (namely, the autors of LaTeX and its packages).
>> 
>> The ability to use word diffs in Git to compare revisions of a document (a feature that has an option to respect LaTeX syntax) also proves useful at times.
>> 


Hope this gives you a flavour of what is to come.


All the best.

	Ross


Dr Ross Moore
Department of Mathematics and Statistics 
12 Wally’s Walk, Level 7, Room 734
Macquarie University, NSW 2109, Australia
T: +61 2 9850 8955  |  F: +61 2 9850 8114
M:+61 407 288 255  |  E: ross.moore at mq.edu.au
http://www.maths.mq.edu.au

CRICOS Provider Number 00002J. Think before you print. 
Please consider the environment before printing this email.

This message is intended for the addressee named and may 
contain confidential information. If you are not the intended 
recipient, please delete it and notify the sender. Views expressed 
in this message are those of the individual sender, and are not 
necessarily the views of Macquarie University. <http://mq.edu.au/>
CRICOS Provider Number 00002J. Think before you print. 
Please consider the environment before printing this email.

This message is intended for the addressee named and may 
contain confidential information. If you are not the intended 
recipient, please delete it and notify the sender. Views expressed 
in this message are those of the individual sender, and are not 
necessarily the views of Macquarie University.
 <http://mq.edu.au/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://tug.org/pipermail/accessibility/attachments/20211031/7af235a6/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 4605 bytes
Desc: not available
URL: <https://tug.org/pipermail/accessibility/attachments/20211031/7af235a6/attachment-0001.png>


More information about the accessibility mailing list.