[texhax] search for text in a pdf file
karl at freefriends.org
Sat Aug 7 03:12:05 CEST 2004
> so now i'm back where i started, only just a bit smarter. so what else do
> y'all use to pull text out of a pdf such as this one?
In general, pdftotext from xpdf can be better than pdf2ps | ps2ascii.
But if the text search in xpdf or acrobat doesn't find anything, it
won't help, and OCR is your only hope.
Fortunately there is at least one open source project:
Yep, that's a big one. There are other.
Another one is OCRAD, which was offered to GNU, and eventually accepted:
I found these links while evaluating ocrad about a year ago, don't know
if they're still valid, but FWIW:
I've never tried any of them personally.
More information about the texhax