[OS X TeX] converting ligatures into text

Maarten Sneep maarten.sneep at xs4all.nl
Fri Apr 22 15:45:16 CEST 2005


On 22 apr 2005, at 15:37, Lawrence Paulson wrote:

>  I have to extract text from a large number of PDF documents produced 
> using TeX. Because (I presume) of TeX's non-standard font encodings, 
> cut and paste often goes wrong. In particular, ligatures get garbled: 
> I get di±cult instead of difficult.

What tool do you use to extract the text? Copy & paste from Acrobat? 
pdftotext (part of xpdf, you could compile it yourself, or get an 
installer from http://www.bluem.net/downloads/pdftotext_en/).

> Does anybody know of a program (or of a definitive set of replacements 
> that could be given to Perl) for cleaning up such text?

That would depend on the various encodings, and expectations of the 
encoding of the text in the file you create. I think this is a tough 
one to answer, in general.

Maarten
--------------------- Info ---------------------
Mac-TeX Website: http://www.esm.psu.edu/mac-tex/
           & FAQ: http://latex.yauh.de/faq/
TeX FAQ: http://www.tex.ac.uk/faq
List Post: <mailto:MacOSX-TeX at email.esm.psu.edu>





More information about the macostex-archives mailing list