[pdftex] experiment with tagged PDF

Paul Topping pault at dessci.com
Tue Apr 29 22:38:44 CEST 2008

We already have a prototype version of our MathPlayer software that adds
into Adobe Reader. (Currently, MathPlayer is only publicly available as
a free downloadable plugin for the Internet Explorer browser.) The
intention is that it support screen readers that work with Adobe Reader
but it will also have functionality for all users in that it will allow
math in PDF to be copied via the clipboard into other applications. We
are also working on mathematical search and clearly tagged math will be
required for math in PDF to be "seen" by search engines. 

Math tagged with MathML will be our first choice for better
accessibility and interoperability but math tagged only with TeX will
still be quite useful. We have technology to convert TeX to MathML on
the fly and, of course, the TeX itself is useful. TeX-savvy users should
be able to copy an equation's TeX code to the clipboard as well. If the
equation is tagged only with MathML, it can also be converted to TeX for
that purpose.

Right now, I think we have a sort of chicken-and-egg problem in that
there is not much interest in these things as no one has had a chance to
play with them and there's not much math that is tagged. I expect that
online content containing tagged math will reach critical mass soon. It
will start with HTML as things are a little farther along there but,
once people get used to having useful math in their HTML content, demand
for the same in PDF will immediately follow. After all, PDF/UA got
started because Adobe and other PDF people saw what was happening with
screen readers in web browsers and did not want to be left behind.

Paul Topping
Design Science, Inc.

> -----Original Message-----
> From: pdftex-bounces at tug.org [mailto:pdftex-bounces at tug.org] 
> On Behalf Of Thierry Bouche
> Sent: Tuesday, April 29, 2008 4:04 AM
> To: pdfTeX list
> Subject: Re: [pdftex] experiment with tagged PDF
> Hi Neil, Thanh, & others,
> N> For the math part, make sure you tag the math as 
> "formula".  Ideally, you
> N> should tag each subexpression with the appropriate MathML 
> element name (eg,
> N> "mfrac" for fractions), but at the very least, add a "tex" 
> attribute to
> N> "formula" and include the TeX string.
> I think this is really something we are missing today but I 
> am not sure
> I understand the implications: Would this help searching 
> using tex code
> inside the formulas? Would this be solely exposed to 
> nonvisual PDF screen
> reader, which would select what kind of alternative text they consume
> based on a format-type attribute?
> In this case, is it foreseen that any tex-aware screen reader 
> will ever
> exist?
> Given that there are no Unicode 3.0 math fonts around (or that
> not all math will be typeset with STIX hopefully anyway...), the
> characters string used to print math glyphs is useless for
> accessibility. Sometimes, the unicode character can be recovered from
> the glyph name in the font, or a ToUnicode if present. But 
> not so often
> in our brave pdftex/CM paradigm. Does the tagging infrastructure in
> pdftex's patch go as far as trying to match each printed 
> glyph, math or
> text, to a unicode char? Would it allow for using external processes
> such as tralics that would be fed with a constant header, and the tex
> string of the formula, so that it could be possible to add a Formula
> tag with pMathML content and tex source in alt? (which seems to me the
> best we can hope for accessibility and functionality, unless I am
> completely misdirected)?
> N> You could also add an "alt" attribute
> N> to "formula" that contains the TeX, but as "alt" is meant 
> to be human
> N> readable, it is questionable whether TeX is really 
> appropriate there.
> Indeed, to me (and most working mathematicians), the tex code is
> precisely the most portable, readable, useful fallback textual version
> for a math formula. It is even what we'd dream to copy-paste 
> from a PDF
> (or HTML) with our today's working environment!
> Remember Knuth said 'math coding in tex is like telling 
> formulas with a
> colleague over the phone'? 
> So putting the tex code in alt is not necessarily appropriate to
> anyone, but it is the only fully textual human-readable format bearing
> unambiguous math of any level (up to author's macros)...
> So many questions!
> Th.
> _______________________________________________
> pdftex mailing list
> pdftex at tug.org
> http://tug.org/mailman/listinfo/pdftex

More information about the pdftex mailing list