OT: Creating PDF with both scanned images of text and also raw text

Peter Flynn peter at silmaril.ie
Thu Jul 4 20:02:00 CEST 2019

Oh right, I see, I misunderstood.


On 4 July 2019 12:23:08 Aaron Gray <aaronngray.lists at gmail.com> wrote:
> On Thu, 4 Jul 2019 at 00:28, Peter Flynn <peter at silmaril.ie> wrote:
> On 03/07/2019 22:09, Aaron Gray wrote:
>> I  am scanning old papers in both image and OCR'ed form and I want to
>> be able to combine them in a PDF document so the images are visible
>> but the text also is in the PDF for anyone who wants to extract it.
>> I have found camera ready PDF's that have text in them and been able
>> to extract both so I want to be able to do the same.
> The pdfimages utility will extract the images separately to PNM files,
> which you can convert to JPEG with ImageMagick or similar.
> What are you using for the OCR? I have had excellent restults withTesseract.
> Sorry no I am after creating PDF's with image based content and hidden text 
> that it retrievable with PDF text extraction tools.
> Thanks,
> Aaron
> --
> Aaron Gray
> Independent Open Source Software Engineer, Computer Language Researcher, 
> Information Theorist, and amateur computer scientist.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://tug.org/pipermail/texhax/attachments/20190704/e10847ff/attachment.html>

More information about the texhax mailing list