[Tugindia] Closeup of word spaces in PDF text extraction

Manjusha Joshi manjusha.joshi at gmail.com
Wed Apr 6 13:24:16 CEST 2011


> On extracting text from PDF (created from dvips driver) certain word spaces
> are getting closed up, like the characters followed by 'W'. This happens
> only for serif fonts like Times-Bold, and not with the sans serif fonts like
> Helvetica.
> I found that when one extracts text which contents words  with  fi, ff   in
its spelling, these letter becoms unreadable for IDE like kile.

> Can anyone advice, how to get the correct extraction of text from the LaTeX
> generated PDFs?
One solution I found is, copy pdf with copy tool from the pdf viewer and
paste it in gedit. It can read the pasted text in correct way. May be it
works in your case also.

Manjusha S. Joshi
Lecturer in Computational  Mathematics,
BIM, Pune, India. www.bprim.org
Mobile:  09822 319328

More information about the tugindia mailing list