[pdftex] pdf inclusion bug

Fri Mar 29 22:51:38 CET 2019

Hi Robert, and others

> On 29 Mar 2019, at 1:15 pm, Robert <w.m.l at gmx.net> wrote:
> 
> Hi,
> 
> testing TL19, I stumbled upon a bug in pdftex that seems to have existed for a while.

I’ve confirmed all the below, using Acrobat Pro’s Preflight to check whether there
is any bad PDF in the examples.

> 
> Compile this file (a.tex) with a current luatex 1.10:
> 
> ------------------------------
> \pdfvariable compresslevel0
> \nopagenumbers
> \pageheight=5cm
> \pagewidth=5cm
> ABC
> \bye
> ------------------------------
> 
> and then compile this file (b.tex) with pdflatex:
> 
> ------------------------------
> \pdfcompresslevel0
> \documentclass{article}
> \pagestyle{empty}
> \usepackage{graphicx}
> \begin{document}
> \includegraphics{a}
> \end{document}
> ------------------------------
> 
> This will result in three warnings:
> 
> pdfTeX warning: [...]: glyph `' undefined
> 
> pdfTeX warning: [...]: glyph `A ' undefined
> 
> pdfTeX warning: [...]: glyph `B ' undefined
> 
> and only the "C" will be visible in b.pdf. And if you look into b.pdf, the included font contains "/CharSet (//A /B /C)", which is obviously wrong.

It’s actually a bit worse than this.
The font subset’s Encoding vector has been rewritten:

/Encoding 256 array
0 1 255 {1 index exch /.notdef put} for
dup 67 /C put
readonly def

which is why only the C is found.

> 
> Everything is fine if I compile a.tex with an older luatex. The difference is that previous versions of luatex wrote the charset without spaces: "/CharSet(/A/B/C)", whereas luatex 1.10 writes: "/CharSet( /A /B /C)". However, this seems like perfectly valid PDF to me, so I'd say that luatex is innocent here, but that this is a bug in pdftex's inclusion procedure.

Acrobat’s Preflight shows no problem with use of the  /CharSet( /A /B /C)  string in  a.pdf .
So I’m guessing that pdftex is trying to combine the instance of CMR10 from a.pdf with that
required for building  b.pdf  (even though there are no actual extra characters used).
In doing this it parses  ( /A /B /C)  incorrectly,  as  ‘ ‘, ‘/A ‘, ‘/B ‘, ‘/C’  and finds a subroutine for /C only.

Unfortunately the PDF Spec says nothing about having the space character as a delimiter;
viz.

 (Optional; meaningful only in Type 1 fonts; PDF 1.1 ) A string listing
the character names defined in a font subset. The names in this
string shall be in PDF syntax—that is, each name preceded by a
slash (/). The names may appear in any order. The name . notdef
shall be omitted; it shall exist in the font subset. If this entry is
absent, the only indication of a font subset shall be the subset tag
in the FontName  entry (see 9.9.1, "Font subsets"). 

In PDF 2.0 the Charset can (and should) be omitted entirely: viz.

(PDF 2.0: The
presence of this value in a PDF may cause a PDF to display
differently from how it will be printed. It shall be considered
deprecated and PDF writers shall not write it, while PDF readers
should ignore it when present.)

Nevertheless,  a.pdf  seems to be valid PDF, so yes it is pdftex that is getting this wrong.
I’d expect the spaces to be removed.

BTW, the PDF Spec defines a name object this way:

 4.33
name object
atomic symbol uniquely defined by a sequence of characters introduced by a SOLIDUS (/), (2Fh) but the
SOLIDUS is not considered to be part of the name

which again says nothing about trailing spaces.
However, it does say:

4.46
white-space character
characters that separate PDF syntactic constructs such as names and numbers from each other; white space characters are HORIZONTAL TAB (09h), LINE FEED (0Ah), FORM FEED (0Ch), CARRIAGE RETURN (0Dh), SPACE (20h); (see Table 1 in 7.2.2, “Character Set”)

Since the Charset is a sequence of syntactic constructs, this would imply that the space is indeed removable in this setting,
as are the other usual white-space characters.

> 
> 
> Best regards,
> -- 
> Robert
> 

In short, I agree with Robert.  :-)

Cheers

	Ross