[texhax] search for text in a pdf file

Randolph J. Herber herber at dcdrjh.fnal.gov
Fri Aug 6 17:42:19 CEST 2004


Perhaps, just perhaps, print and eye-ball or the electronic equivalent
called optical scanners which supply text outputs.  That is how many
of the electronic books at Project Gutenburg and Blaskmask are done.
If you can avail permission to make the document acceptable to these
projects (having to do with copyrights), then they just might do the
optical scanning to text for you.

The following header lines retained to effect attribution:
>Date: Fri, 06 Aug 2004 08:55:52 -0400 (EDT)
>From: =?iso-8859-1?Q?BR=D8CK=E2QU=C2NTIFIER=2EORG?= <brock at quantifier.org>
>Subject: Re: [texhax] search for text in a pdf file
>To: Philip TAYLOR <P.Taylor at rhul.ac.uk>
>Cc: texhax at tug.org
>X-Spam-Level:
>	<mailto:texhax-request at tug.org?subject=subscribe>
>	<mailto:texhax-request at tug.org?subject=unsubscribe>

>thanks, everyone, for your surprisingly helpful replies.

>let's see if i can answer my own queestion.

>I was advised that there are two sorts of .pdf files.  ones made up of
>text and images, and ones made up of just images.  the text/image ones
>should be easy to search through, while the image only ones will require a
>converter of some sort (something stronger than ps2ascii, i've found).

>next i installed acroread which seems to be just a nice little port of
>adobe acrobat.  to get it i had to add

>deb ftp://ftp.nerim.net/debian-marillat/ unstable main

>to my /etc/apt/sources.list

>and poof, there it is.  thanks apt.  it comes iwth a find function right
>on the little tool bar, but since my copy of my pdf seems to be of the
>image only variety (I know its just a scan of a journal) i couldnt find
>text.  sucks.

>so now i'm back where i started, only just a bit smarter.  so what else do
>y'all use to pull text out of a pdf such as this one?

>thanks,

>=f=o=r=t=u=n=e=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
>You will get what you deserve.
>=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
>= http://www.gmail-is-too-creepy.com - http://hushmail.com -=-=-=-=

>On Fri, 6 Aug 2004, Philip TAYLOR wrote:

>> Maybe use Adobe Acrobat ?  That has inbuilt search facilities.
>> Philip Taylor
>> --------
>> BRØCKâQUÂNTIFIER.ORG wrote:
>>  >
>>  > hello smart folks.  i have a pdf file that is some 350 pages.  and i need
>>  > to search for one phrase.  gv doesnt seem to have a search function, i've
>>  > tried ps2ascii and pstotext after converting the pdf with pdf2ps.  the
>>  > resulting txt files just have a bunch of ^L over and over.  I'm stumped.
>>  > how do i search for a phrase in a pdf file?
>>  >
>>  > thanks for any help from a forum that's not really about pdf.
>>  >
>>  > bobby

>> _______________________________________________
>> TeX FAQ: http://www.tex.ac.uk/faq
>> TeX newsgroup: http://groups.google.com/groups?group=comp.text.tex
>> Mailing list archives: http://tug.org/pipermail/texhax/
>> More links: http://tug.org/begin.html

>> Automated subscription management: http://tug.org/mailman/listinfo/texhax
>> Human mailing list managers: postmaster at tug.org

>_______________________________________________
>TeX FAQ: http://www.tex.ac.uk/faq
>TeX newsgroup: http://groups.google.com/groups?group=comp.text.tex
>Mailing list archives: http://tug.org/pipermail/texhax/
>More links: http://tug.org/begin.html

>Automated subscription management: http://tug.org/mailman/listinfo/texhax
>Human mailing list managers: postmaster at tug.org

Randolph J. Herber, herber at fnal.gov, +1 630 840 2966, CD/CDFTF PK-149F,
Mail Stop 318, Fermilab, Kirk & Pine Rds., PO Box 500, Batavia, IL 60510-0500,
USA.  (Speaking for myself and not for US, US DOE, FNAL nor URA.)  (Product,
trade, or service marks herein belong to their respective owners.)



More information about the texhax mailing list