OT: Creating PDF with both scanned images of text and also raw text

Aaron Gray aaronngray.lists at gmail.com
Thu Jul 4 13:21:05 CEST 2019

On Thu, 4 Jul 2019 at 00:28, Peter Flynn <peter at silmaril.ie> wrote:

> On 03/07/2019 22:09, Aaron Gray wrote:
> > I  am scanning old papers in both image and OCR'ed form and I want to
> > be able to combine them in a PDF document so the images are visible
> > but the text also is in the PDF for anyone who wants to extract it.
> >
> > I have found camera ready PDF's that have text in them and been able
> > to extract both so I want to be able to do the same.
> The pdfimages utility will extract the images separately to PNM files,
> which you can convert to JPEG with ImageMagick or similar.
> What are you using for the OCR? I have had excellent restults with


Sorry no I am after creating PDF's with image based content and hidden text
that it retrievable with PDF text extraction tools.



Aaron Gray

Independent Open Source Software Engineer, Computer Language Researcher,
Information Theorist, and amateur computer scientist.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://tug.org/pipermail/texhax/attachments/20190704/2d96bd45/attachment.html>

More information about the texhax mailing list