[pdftex] making pdf document accessible using LaTeX

Ross Moore ross at ics.mq.edu.au
Sat Nov 17 08:12:22 CET 2007

Hi Neil,

On 17/11/2007, at 7:50 AM, Neil Soiffer wrote:

> Ross,
> I like the name "serendiPDF"... too bad there wasn't a way to  
> squeeze an "e"
> sound on the end :-)


>> From your tug article (www.tug.org/TUGboat/Articles/tb23-1/ 
>> moore.pdf), it
> sounds like you had multiple reasons for developing serendiPDF.   
> One was for
> teaching TeX in some sense (TeX by example) and another was for  
> searching
> within the document.   Have you had any feedback on either?

Reasons or justification ?   A bit of each.
I believe there is value in having the TeX source of mathematics,
and other kinds of structured information, available within the PDF.
Certainly it could be a MathML representation, but even then I would
include TeX source as the presentation form of it, as this would be
easier for most people to use.

The lack of feedback indicates that either others don't care about it,
or I've not been presenting my work in the correct places.

> If wasn't clear from your article how the TeX is represented in the  
> PDF.
> The PDF isn't tagged, so I'd guess they are annotations.

Yes. Ultimately it is \pdfannot that puts it into the PDF.

> I realize that
> accessibility wasn't your goal, but putting them in as Alt Tags  
> would have
> made the document much more readable/intelligible to someone using  
> a screen
> reader.  It would also mean that search engines would likely find  
> the TeX.
> I'm sure someone will correct me if I'm wrong, but I don't think  
> search
> engines look at PDF annotations.

I've never looked at tagged PDF, until today.
So previously I was using what I knew how to do, devising a way
to get visible feedback on the presence of the text-fields holding
the source TeX of mathematical content.

Everything had to work in existing PDF browsers --- well, at least
in Adobe Reader. (However, it had to be more elegant and subtle
than the clunky annotation/comment-markers that AR uses.)

> In your paper you described a macro package that did some pretty  
> clever
> hacks to insert the TeX.  If you didn't need to creates forms/ 
> targets for
> the JavaScript and just wanted to create Alt tags with the TeX, it  
> seems
> that it would have been simpler to get them into the PDF

After playing a bit with Adobe Pro 8, I see now how it can be used
to create a tree that encodes a document's structure, optionally
tagged with meaningful text strings.

> -- you wouldn't 
> need to the measuring in that case.

For some kinds of tags, yes.

But others, such as the /Formula tag, there can still be a need
to know the height and width; indeed an exact location:

11 0 obj
<</A 13 0 R/K 4/P 4 0 R/S/Formula/T()/Pg 21 0 R>>
13 0 obj
<</O/Layout/Width 20.6301/Height 12.1448/BBox 14 0 R>>
14 0 obj
[371.644 521.321 392.275 533.466]

This is presumably because of the quasi-2D layout
of mathematical expressions.

> Do you think it is something that
> pdftex itself could/should do, or do you think adding tags is best  
> left to a
> macro package?

It must be pdfTeX itself, otherwise exact locations, such as in the  
example, cannot be known.

> Is it something you could do to your macro package?  If you
> do add the Alt tag, make sure you add a "Formula" tag around the  
> math parts
> also -- both the PDF spec and draft for PDF/UA say it should be used.

Even without the location issue, I think it would be extremely difficult
to build a structure tree within macros.

Furthermore, once built, it needs to be linked to the pages tree,
suitably modified to be aware that a structure tree is present.
Again, I suspect that only pdfTeX can do this properly.
But I'm saying this without having any explicit experience,
so could well be wrong.

> Standing up on my soapbox...
> Being someone working on accessibility, I obviously think that TeX  
> should
> produce something that is accessible.  But I'm not alone.  Many  
> governments
> and lots of Universities are developing accessibility requirements for
> electronic material on their web sites.  This was the chief motivating
> factor for AIIM to start up a standards effort.

A big problem for TeX is indicating word boundaries.
Have you tried getting Adobe Reader to read through a typical
document created bf pdfTeX ?

> I hope the developers on this list make accessibility of the PDF a  
> priority
> by producing tagged PDF that includes the TeX, or better yet, a MathML
> equivalent for the math parts of documents.  Sadly, accessibility  
> is an area
> where pdftex is behind where it should be.

Surely creating full structure trees should be a priority.
ALT and /Formula tags would be just a natural part of this.

> Neil Soiffer
> Senior Scientist
> Design Science, Inc.
> www.dessci.com
> ~ Makers of Equation Editor, MathType, MathPlayer and MathFlow ~



> ----- Original Message -----
> From: "Ross Moore" <ross at ics.mq.edu.au>
> To: "Neil Soiffer" <NeilS at dessci.com>
> Cc: <pdftex at tug.org>
> Sent: Wednesday, November 14, 2007 12:19 PM
> Subject: Re: [pdftex] making pdf document accessible using LaTeX
>> Hi Neil,
>> On 15/11/2007, at 5:39 AM, Neil Soiffer wrote:
>>>> By the way, I have been pointed to a developing NISO standard for
>>>> accessible maths, that would be similar to this with MathML, if I
>>>> understood well. A variation of Design Science's MathPlayer would
>>>> have
>>>> been able to read aloud such a PDF, including the maths read not
>>>> as ASCII
>>>> source, but as real maths.
>>> I think the standard you are referring to is the DAISY+MathML
>>> spec.  It is
>>> available at
>>> www.daisy.org/projects/mathml
>>> Another effort people might be interested in is PDF/UA (Universal
>>> Access).
>>> This AIIM committee that is working on developing an ISO standard  
>>> for
>>> accessible PDF. See www.aiim.org/standards.asp?ID=27861.  Part of
>>> that work
>>> involves making sure that math in PDF is accessible.  It does  
>>> this by
>>> tagging the math with MathML.
>> Back in 2002, I developed a method to include the TeX source of
>> mathematics
>> as popup text-fields in a PDF, generated by pdfLaTeX. As the mouse
>> tracks over
>> the displayed mathematics, a button appears to toggle show/hide of
>> the field.
>> Furthermore, the fields are searchable, so I added a simple search
>> widget,
>> and Javascript methods to implement searching.
>> I called this technique  serendiPDF , and showed an example at TUG
>> 2002, India.
>> The attached PDF is the example that I used there.
> _______________________________________________
> pdftex mailing list
> pdftex at tug.org
> http://tug.org/mailman/listinfo/pdftex

Ross Moore                                         ross at maths.mq.edu.au
Mathematics Department                             office: E7A-419
Macquarie University                               tel: +61 +2 9850 8955
Sydney, Australia  2109                            fax: +61 +2 9850 8114

More information about the pdftex mailing list