[pdftex] OT Convert from pdf to ascii text.
John R. Culleton
john at wexfordpress.com
Sat Mar 5 14:00:45 CET 2005
This is not about pdftex as such but I don't know another place
where my question can be answered.
I index books by convertng the customer's pdf file to ps, and
then the ps file to ascii text. Then I embed makeindex tags in
the resulting text file, using an editor. A customer file created
by Adobe InDesign CS gives me a fit. The conversion to
PostScript works using the Ghostscript utilitiy psf2ps. The
resulting file can be read in gv and pages can be selected from
gv. But there seems to be no way to boil the file down to ascii
text short of cutting and pasting into an editor. ps2ascii fails,
and so does pstotext. I have tried Slackware and Knoppix.
Acrobat reader 5 will read the original pdf file but gives warning
messages about unimplemented features. Xpdf reads the file but
shows garbage characters.
The fonts are all Type 1 with encoding Identity-H, whatever that
Possibly it is a font problem. Possibly it is a PostScript 1.5
problem. In any case it is a problem.
Able Indexers and Typesetters
More information about the pdftex