[texhax] search for text in a pdf file
brock at quantifier.org
Fri Aug 6 14:55:52 CEST 2004
thanks, everyone, for your surprisingly helpful replies.
let's see if i can answer my own queestion.
I was advised that there are two sorts of .pdf files. ones made up of
text and images, and ones made up of just images. the text/image ones
should be easy to search through, while the image only ones will require a
converter of some sort (something stronger than ps2ascii, i've found).
next i installed acroread which seems to be just a nice little port of
adobe acrobat. to get it i had to add
deb ftp://ftp.nerim.net/debian-marillat/ unstable main
to my /etc/apt/sources.list
and poof, there it is. thanks apt. it comes iwth a find function right
on the little tool bar, but since my copy of my pdf seems to be of the
image only variety (I know its just a scan of a journal) i couldnt find
so now i'm back where i started, only just a bit smarter. so what else do
y'all use to pull text out of a pdf such as this one?
You will get what you deserve.
= http://www.gmail-is-too-creepy.com - http://hushmail.com -=-=-=-=
On Fri, 6 Aug 2004, Philip TAYLOR wrote:
> Maybe use Adobe Acrobat ? That has inbuilt search facilities.
> Philip Taylor
> BRØCKâQUÂNTIFIER.ORG wrote:
> > hello smart folks. i have a pdf file that is some 350 pages. and i need
> > to search for one phrase. gv doesnt seem to have a search function, i've
> > tried ps2ascii and pstotext after converting the pdf with pdf2ps. the
> > resulting txt files just have a bunch of ^L over and over. I'm stumped.
> > how do i search for a phrase in a pdf file?
> > thanks for any help from a forum that's not really about pdf.
> > bobby
> TeX FAQ: http://www.tex.ac.uk/faq
> TeX newsgroup: http://groups.google.com/groups?group=comp.text.tex
> Mailing list archives: http://tug.org/pipermail/texhax/
> More links: http://tug.org/begin.html
> Automated subscription management: http://tug.org/mailman/listinfo/texhax
> Human mailing list managers: postmaster at tug.org
More information about the texhax