[pdftex] experiment with tagged PDF

Thierry Bouche thierry.bouche at ujf-grenoble.fr
Tue Apr 29 13:04:14 CEST 2008


Hi Neil, Thanh, & others,

N> For the math part, make sure you tag the math as "formula".  Ideally, you
N> should tag each subexpression with the appropriate MathML element name (eg,
N> "mfrac" for fractions), but at the very least, add a "tex" attribute to
N> "formula" and include the TeX string.

I think this is really something we are missing today but I am not sure
I understand the implications: Would this help searching using tex code
inside the formulas? Would this be solely exposed to nonvisual PDF screen
reader, which would select what kind of alternative text they consume
based on a format-type attribute?

In this case, is it foreseen that any tex-aware screen reader will ever
exist?

Given that there are no Unicode 3.0 math fonts around (or that
not all math will be typeset with STIX hopefully anyway...), the
characters string used to print math glyphs is useless for
accessibility. Sometimes, the unicode character can be recovered from
the glyph name in the font, or a ToUnicode if present. But not so often
in our brave pdftex/CM paradigm. Does the tagging infrastructure in
pdftex's patch go as far as trying to match each printed glyph, math or
text, to a unicode char? Would it allow for using external processes
such as tralics that would be fed with a constant header, and the tex
string of the formula, so that it could be possible to add a Formula
tag with pMathML content and tex source in alt? (which seems to me the
best we can hope for accessibility and functionality, unless I am
completely misdirected)?

N> You could also add an "alt" attribute
N> to "formula" that contains the TeX, but as "alt" is meant to be human
N> readable, it is questionable whether TeX is really appropriate there.

Indeed, to me (and most working mathematicians), the tex code is
precisely the most portable, readable, useful fallback textual version
for a math formula. It is even what we'd dream to copy-paste from a PDF
(or HTML) with our today's working environment!

Remember Knuth said 'math coding in tex is like telling formulas with a
colleague over the phone'? 

So putting the tex code in alt is not necessarily appropriate to
anyone, but it is the only fully textual human-readable format bearing
unambiguous math of any level (up to author's macros)...

So many questions!

Th.



More information about the pdftex mailing list