[texhax] search for text in a pdf file

Tom Schneider toms at ncifcrf.gov
Fri Aug 6 17:45:21 CEST 2004


> so now i'm back where i started, only just a bit smarter.  so what else do
> y'all use to pull text out of a pdf such as this one?

If I understand you, you have an image and want text.  That requires
optical character recognition (OCR), a difficult thing.

Fortunately there is at least one open source project:

http://jocr.sourceforge.net/

It's called GOCR or JOCR.

It's still in development but might do the trick.  I'm sure they would
appreciate the attention ...

Tom

  Dr. Thomas D. Schneider
  National Cancer Institute
  Laboratory of Experimental and Computational Biology
  Frederick, Maryland  21702-1201
  toms at ncifcrf.gov
  permanent email: toms at alum.mit.edu (use only if first address fails)
  http://www.lecb.ncifcrf.gov/~toms/



More information about the texhax mailing list