OT: Creating PDF with both scanned images of text and also raw text

Aaron Gray aaronngray.lists at gmail.com
Thu Jul 4 13:21:05 CEST 2019


On Thu, 4 Jul 2019 at 00:28, Peter Flynn <peter at silmaril.ie> wrote:

> On 03/07/2019 22:09, Aaron Gray wrote:
> > I  am scanning old papers in both image and OCR'ed form and I want to
> > be able to combine them in a PDF document so the images are visible
> > but the text also is in the PDF for anyone who wants to extract it.
> >
> > I have found camera ready PDF's that have text in them and been able
> > to extract both so I want to be able to do the same.
>
> The pdfimages utility will extract the images separately to PNM files,
> which you can convert to JPEG with ImageMagick or similar.
>
> What are you using for the OCR? I have had excellent restults with

Tesseract.
>

Sorry no I am after creating PDF's with image based content and hidden text
that it retrievable with PDF text extraction tools.

Thanks,

Aaron

-- 
Aaron Gray

Independent Open Source Software Engineer, Computer Language Researcher,
Information Theorist, and amateur computer scientist.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://tug.org/pipermail/texhax/attachments/20190704/2d96bd45/attachment.html>


More information about the texhax mailing list