OT: Creating PDF with both scanned images of text and also raw text

Peter Flynn peter at silmaril.ie
Thu Jul 4 01:26:57 CEST 2019


On 03/07/2019 22:09, Aaron Gray wrote:
> I  am scanning old papers in both image and OCR'ed form and I want to
> be able to combine them in a PDF document so the images are visible
> but the text also is in the PDF for anyone who wants to extract it.
> 
> I have found camera ready PDF's that have text in them and been able
> to extract both so I want to be able to do the same.

The pdfimages utility will extract the images separately to PNM files,
which you can convert to JPEG with ImageMagick or similar.

What are you using for the OCR? I have had excellent restults with
Tesseract.

Peter


More information about the texhax mailing list