[pdftex] OT Convert from pdf to ascii text.

John R. Culleton john at wexfordpress.com
Sat Mar 5 14:00:45 CET 2005


This is not about pdftex as such but I don't know another place
where my question can be answered. 

I index books by convertng the customer's pdf file to ps, and
then the ps file to ascii text. Then I embed makeindex tags in
the resulting text file, using an editor. A customer file created
by Adobe InDesign CS gives me a fit. The conversion to
PostScript works using the Ghostscript utilitiy psf2ps. The
resulting file can be read in gv and pages can be selected from
gv. But there seems to be no way to boil the file down to ascii
text short of cutting and pasting into an editor. ps2ascii fails,
and so does pstotext. I have tried Slackware and Knoppix. 

Acrobat reader 5 will read the original pdf file but gives warning
messages about unimplemented features. Xpdf reads the file but
shows garbage characters.

The fonts are all Type 1 with encoding Identity-H, whatever that
means. 

Possibly it is a font problem. Possibly it is a PostScript 1.5
problem.  In any case it is a problem. 

Suggestions?
-- 
John Culleton
Able Indexers and Typesetters
http://wexfordpress.com





More information about the pdftex mailing list