OT: Creating PDF with both scanned images of text and also raw text
Aaron Gray
aaronngray.lists at gmail.com
Thu Jul 4 13:21:05 CEST 2019
On Thu, 4 Jul 2019 at 00:28, Peter Flynn <peter at silmaril.ie> wrote:
> On 03/07/2019 22:09, Aaron Gray wrote:
> > I am scanning old papers in both image and OCR'ed form and I want to
> > be able to combine them in a PDF document so the images are visible
> > but the text also is in the PDF for anyone who wants to extract it.
> >
> > I have found camera ready PDF's that have text in them and been able
> > to extract both so I want to be able to do the same.
>
> The pdfimages utility will extract the images separately to PNM files,
> which you can convert to JPEG with ImageMagick or similar.
>
> What are you using for the OCR? I have had excellent restults with
Tesseract.
>
Sorry no I am after creating PDF's with image based content and hidden text
that it retrievable with PDF text extraction tools.
Thanks,
Aaron
--
Aaron Gray
Independent Open Source Software Engineer, Computer Language Researcher,
Information Theorist, and amateur computer scientist.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://tug.org/pipermail/texhax/attachments/20190704/2d96bd45/attachment.html>
More information about the texhax
mailing list