[pdftex] experiment with tagged PDF

The Thanh Han hanthethanh at gmail.com
Wed Apr 30 12:43:26 CEST 2008

Hi all,

I find the inputs very useful -- I hope to get there someday
soon. At the moment I am still experimenting with the very
basic stuff -- documents similar to "hello, word!",
small2e.tex and the like, just to learn how one makes tagged
pdfs manually (using \pdfobj).


On Tue, Apr 29, 2008 at 01:38:44PM -0700, Paul Topping wrote:
> We already have a prototype version of our MathPlayer software that adds
> into Adobe Reader. (Currently, MathPlayer is only publicly available as
> a free downloadable plugin for the Internet Explorer browser.) The
> intention is that it support screen readers that work with Adobe Reader
> but it will also have functionality for all users in that it will allow
> math in PDF to be copied via the clipboard into other applications. We
> are also working on mathematical search and clearly tagged math will be
> required for math in PDF to be "seen" by search engines. 
> Math tagged with MathML will be our first choice for better
> accessibility and interoperability but math tagged only with TeX will
> still be quite useful. We have technology to convert TeX to MathML on
> the fly and, of course, the TeX itself is useful. TeX-savvy users should
> be able to copy an equation's TeX code to the clipboard as well. If the
> equation is tagged only with MathML, it can also be converted to TeX for
> that purpose.
> Right now, I think we have a sort of chicken-and-egg problem in that
> there is not much interest in these things as no one has had a chance to
> play with them and there's not much math that is tagged. I expect that
> online content containing tagged math will reach critical mass soon. It
> will start with HTML as things are a little farther along there but,
> once people get used to having useful math in their HTML content, demand
> for the same in PDF will immediately follow. After all, PDF/UA got
> started because Adobe and other PDF people saw what was happening with
> screen readers in web browsers and did not want to be left behind.
> Paul Topping
> Design Science, Inc.
> www.dessci.com
> > -----Original Message-----
> > From: pdftex-bounces at tug.org [mailto:pdftex-bounces at tug.org] 
> > On Behalf Of Thierry Bouche
> > Sent: Tuesday, April 29, 2008 4:04 AM
> > To: pdfTeX list
> > Subject: Re: [pdftex] experiment with tagged PDF
> > 
> > Hi Neil, Thanh, & others,
> > 
> > N> For the math part, make sure you tag the math as 
> > "formula".  Ideally, you
> > N> should tag each subexpression with the appropriate MathML 
> > element name (eg,
> > N> "mfrac" for fractions), but at the very least, add a "tex" 
> > attribute to
> > N> "formula" and include the TeX string.
> > 
> > I think this is really something we are missing today but I 
> > am not sure
> > I understand the implications: Would this help searching 
> > using tex code
> > inside the formulas? Would this be solely exposed to 
> > nonvisual PDF screen
> > reader, which would select what kind of alternative text they consume
> > based on a format-type attribute?
> > 
> > In this case, is it foreseen that any tex-aware screen reader 
> > will ever
> > exist?
> > 
> > Given that there are no Unicode 3.0 math fonts around (or that
> > not all math will be typeset with STIX hopefully anyway...), the
> > characters string used to print math glyphs is useless for
> > accessibility. Sometimes, the unicode character can be recovered from
> > the glyph name in the font, or a ToUnicode if present. But 
> > not so often
> > in our brave pdftex/CM paradigm. Does the tagging infrastructure in
> > pdftex's patch go as far as trying to match each printed 
> > glyph, math or
> > text, to a unicode char? Would it allow for using external processes
> > such as tralics that would be fed with a constant header, and the tex
> > string of the formula, so that it could be possible to add a Formula
> > tag with pMathML content and tex source in alt? (which seems to me the
> > best we can hope for accessibility and functionality, unless I am
> > completely misdirected)?
> > 
> > N> You could also add an "alt" attribute
> > N> to "formula" that contains the TeX, but as "alt" is meant 
> > to be human
> > N> readable, it is questionable whether TeX is really 
> > appropriate there.
> > 
> > Indeed, to me (and most working mathematicians), the tex code is
> > precisely the most portable, readable, useful fallback textual version
> > for a math formula. It is even what we'd dream to copy-paste 
> > from a PDF
> > (or HTML) with our today's working environment!
> > 
> > Remember Knuth said 'math coding in tex is like telling 
> > formulas with a
> > colleague over the phone'? 
> > 
> > So putting the tex code in alt is not necessarily appropriate to
> > anyone, but it is the only fully textual human-readable format bearing
> > unambiguous math of any level (up to author's macros)...
> > 
> > So many questions!
> > 
> > Th.
> > 
> > _______________________________________________
> > pdftex mailing list
> > pdftex at tug.org
> > http://tug.org/mailman/listinfo/pdftex
> > 
> _______________________________________________
> pdftex mailing list
> pdftex at tug.org
> http://tug.org/mailman/listinfo/pdftex

More information about the pdftex mailing list