pdftotext removing "fi" from a recent pdf I made with latex,

Tomas Rokicki rokicki at gmail.com
Sun Nov 24 11:50:59 CET 2019


I believe this is largely a poppler problem.  I'd be happy to discuss it a
bit more if you would like.

-tom


On Sun, Nov 24, 2019 at 2:47 AM Mike Marchywka <marchywka at hotmail.com>
wrote:

> On Sun, Nov 24, 2019 at 12:11:07AM +0000, Mike Marchywka wrote:
> >
> > I have never seen this before but looks like a stupid font problem
> > but it likely to be common with many pdf's now. If I just run
> > "pdftotext" on my output, I get weird boxes where each "fi"
> > is. If I used "-enc ASCII7" the entire thing is deleted.
> >
> > I could probably create a minimal working example but thought someone
> > may know offhand. Thanks.
>
> Nevermind, I figured it out :) I added this stupid thing
>
> \usepackage[T1]{fontenc}
>
>  to fix another problem although if you are finding pdftotext output
> is jumbled or want to use the pdf ( and maybe dvi )  format
> to obscure information that would be in a normal text file ,
> this seems to work,
>
>
>
>  \documentclass{article}
> \usepackage[T1]{fontenc}
>  \usepackage{hyperref}
>   \hypersetup{
>    pdfinfo={
>      x-bib-author  = {A. Writer},
>       x-bib-journal = {Test}
>         x-bib-buy-url = {https://buyexpensivejunk}
>     }
>  }
>
> \newcommand{\addbib}[2]
> {
>   \hypersetup{
>    pdfinfo={ x-bib-#1  = {#2} } }
>
> }
> \addbib{author}{marchywka}
> \addbib{title}{my title}
> \addbib{omething}{foobar abstratct asdfasdfa }
>
> \begin{document}
> test
> a word that defines the problem, d e f i n e s
> \end{document}
>
>
> Compiling to pdf and inverting gives this,
>
> cat schumann.pdf | pdftotext - -
> test a word that de nes the problem, d e f i n e s
>
> 1
>
>
>
>
> >
> > This is the version,
> >
> > pdftotext -v
> > pdftotext version 0.41.0
> > Copyright 2005-2016 The Poppler Developers -
> http://poppler.freedesktop.org
> > Copyright 1996-2011 Glyph & Cog, LLC
> >
> > and basic info on the pdf file,
> > exifutil -list vitaprop.pdfExifTool Version Number         : 11.75
> > File Name                       : vitaprop.pdf
> > Directory                       : .
> > File Size                       : 287 kB
> > File Modification Date/Time     : 2019:11:23 06:17:53-05:00
> > File Access Date/Time           : 2019:11:23 06:17:53-05:00
> > File Inode Change Date/Time     : 2019:11:23 06:17:53-05:00
> > File Permissions                : rw-rw-r--
> > File Type                       : PDF
> > File Type Extension             : pdf
> > MIME Type                       : application/pdf
> > PDF Version                     : 1.5
> > Linearized                      : No
> > Page Count                      : 12
> > Page Mode                       : UseOutlines
> > Author                          :
> > Title                           :
> > Subject                         :
> > Creator                         : LaTeX with hyperref package
> > Producer                        : pdfTeX-1.40.16
> > Create Date                     : 2019:11:23 06:17:52-05:00
> > Modify Date                     : 2019:11:23 06:17:52-05:00
> > Trapped                         : False
> > PTEX Fullbanner                 : This is pdfTeX, Version
> 3.14159265-2.6-1.40.16 (TeX Live 2015/Debian) kpathsea version 6.2.1
> >
> >
> > --
> >
> > mike marchywka
> > 306 charles cox
> > canton GA 30115
> > USA, Earth
> > marchywka at hotmail.com
> > 404-788-1216
> > ORCID: 0000-0001-9237-455X
> >
>
> --
>
> mike marchywka
> 306 charles cox
> canton GA 30115
> USA, Earth
> marchywka at hotmail.com
> 404-788-1216
> ORCID: 0000-0001-9237-455X
>
>

-- 
--  http://cube20.org/  --  http://golly.sf.net/  --
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://tug.org/pipermail/texhax/attachments/20191124/ea4cf1fe/attachment.html>


More information about the texhax mailing list