CMap issue in PDF file generated with recent TeX Live + Ghostscript

Shunsaku Hirata shunsaku.hirata74 at gmail.com
Sat Oct 16 06:16:22 CEST 2021


Hi,

Probably this is a bug in Ghostscript?

The difference between chartest5a-2020.pdf and chartest5a-2021.pdf
is that for the 2021 version ToUnicode CMap is attached while for the
2020 version it's not.

The reason for the difference in pdftotext outputs seems that ToUnicode
mapping information is modified in some way when they are processed
by Ghostscript.

Actually, the following character mapping can be found for out-2021.pdf,

<13><13><0144>
<14><14><017c>

where characters <13> and <14> ("guillemot left" and "guillemot right")
are mapped to

    U+0144: LATIN SMALL LETTER N WITH ACUTE

and

    U+017C: LATIN SMALL LETTER Z WITH DOT ABOVE

respectively.

In the pdflatex generated PDF chartest5a-2021.pdf, they are correctly
mapped to U+00AB and U+00BB.


Thanks,
Shunsaku Hirata

2021年10月14日(木) 20:30 Vincent Lefevre <vincent at vinc17.net>:
>
> Hi,
>
> After generating a PDF file with pdflatex, I usually run ps2pdf
> (from Ghostscript) to make the PDF much smaller (thanks to the
> font conversion from Type 1 to Type 1C).
>
> While there were no issues with TeX Live up to 2020, CMap gets
> broken when the original PDF file has been obtained with a recent
> TeX Live version.
>
> For instance, consider the following .tex file:
>
> \documentclass[12pt]{article}
> \usepackage[utf8]{inputenc}
> \usepackage[T1]{fontenc}
> \usepackage{lmodern}
> \begin{document}
> \thispagestyle{empty}
> Test: « don't finite float offer affine ».
> \end{document}
>
> I've attached 4 PDF files:
>   * chartest5a-2020.pdf generated by pdflatex with
>     texlive 2020.20210202-3 under Debian/unstable;
>   * chartest5a-2021.pdf generated by pdflatex with
>     texlive 2021.20210921-1 under Debian/unstable;
>   * out-2020.pdf: generated from chartest5a-2020.pdf with ps2pdf;
>   * out-2021.pdf: generated from chartest5a-2021.pdf with ps2pdf.
>
> The ps2pdf script actually runs:
>
>   /usr/bin/gs -P- -dSAFER -dCompatibilityLevel=1.4 -q -P- -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -sstdout=%stderr -sOutputFile=out-2020.pdf -P- -dSAFER -dCompatibilityLevel=1.4 chartest5a-2020.pdf
>
> and
>
>   /usr/bin/gs -P- -dSAFER -dCompatibilityLevel=1.4 -q -P- -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -sstdout=%stderr -sOutputFile=out-2021.pdf -P- -dSAFER -dCompatibilityLevel=1.4 chartest5a-2021.pdf
>
> While the chartest5a-*.pdf files seem fine, pdftotext gives the
> following output on the out-*.pdf files:
>
> out-2020.pdf
>   Test: « don’t finite float offer affine ».
>
> out-2021.pdf
>   Test: ń donŠt Ąnite Ćoat offer affine ż.
>
> So out-2020.pdf is correct, but out-2021.pdf is not.
>
> Note that in both cases, the Ghostscript version is the same (9.53.3,
> but there is the same issue for this testcase with Ghostscript 9.54).
> Thus the different behavior comes from the difference between the
> TeX Live versions.
>
> What is causing this difference? Is this a bug in TeX Live, or only
> in Ghostscript?
>
> --
> Vincent Lefèvre <vincent at vinc17.net> - Web: <https://www.vinc17.net/>
> 100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
> Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)



More information about the tex-live mailing list.