Marked content in PDF (was Re: [pdftex] Re: What's new with Acrobat 5.0)
sojka at informatics.muni.cz
Thu Apr 12 14:26:26 CEST 2001
On Wed, Apr 11, 2001 at 09:21:49AM +0200, Hans Hagen wrote:
> >One thing I didn't see mentioned is "Tagged PDF". I first found this
> >mentioned at http://www.adobe.com/products/acrobat/readerforpalm.html,
> >to allow reflow when breaking it down for their Palm "Acrobat"
> >reader. I haven't seen this mentioned anywhere else, but it would be
> >useful for TeX/LaTeX to produce such files....
> >Does anyone know more about this?
> page 93-142 of the diff between 1.3 / 1.4 doc describes it; tagging was
> already possible in 1.3 (marked content) but since there were no apps using
> it i never moved the experimental stuff i wrote into context. I'm still not
> sure about marked pdf, because it's kind of html.
Shouldn't pdfTeX be the first one app that allows generation
of marked content in PDF? Try e.g.
(as I did the evening before Thanh's defense to check
whether Knuth really started to use term total fit
instead of optimum fit for TeX's line breaking algorithm).
Yes, search engines _do_ index PDF files, and they evaluated
document relevance from it.
I think that it is only matter of time when
good search engines like google will benefit from
the content markup (as they currently do for html,
see e.g. http://dbpubs.stanford.edu:8090/pub/1998-8)
for computation of document relevance. It is information
provider/author motivation to provide it. And as in properly
written TeX files the information is already there, so
why not to use it?
More information about the pdftex