Marked content in PDF (was Re: [pdftex] Re: What's new with Acrobat 5.0)

Petr Sojka sojka at informatics.muni.cz
Thu Apr 12 14:26:26 CEST 2001


On Wed, Apr 11, 2001 at 09:21:49AM +0200, Hans Hagen wrote:

> >One thing I didn't see mentioned is "Tagged PDF".  I first found this
> >mentioned at http://www.adobe.com/products/acrobat/readerforpalm.html,
> >to allow reflow when breaking it down for their Palm "Acrobat"
> >reader.  I haven't seen this mentioned anywhere else, but it would be
> >useful for TeX/LaTeX to produce such files....
> >
> >Does anyone know more about this?
> 
> page 93-142 of the diff between 1.3 / 1.4 doc describes it; tagging was
> already possible in 1.3 (marked content) but since there were no apps using
> it i never moved the experimental stuff i wrote into context. I'm still not
> sure about marked pdf, because it's kind of html. 
 
Shouldn't pdfTeX be the first one app that allows generation
of marked content in PDF? Try e.g.
http://www.google.com/search?q=optimum+fit+algorithm
(as I did the evening before Thanh's defense to check
whether Knuth really started to use term total fit
instead of optimum fit for TeX's line breaking algorithm).
Yes, search engines _do_ index PDF files, and they evaluated
document relevance from it.

I think that it is only matter of time when
good search engines like google will benefit from
the content markup (as they currently do for html,
see e.g. http://dbpubs.stanford.edu:8090/pub/1998-8) 
for computation of document relevance. It is information 
provider/author motivation to provide it. And as in properly 
written TeX files the information is already there, so 
why not to use it?

Best
--ps



More information about the pdftex mailing list