[pdftex] very subtle endobj bug in latest pdftex

Peter Selinger selinger at mathstat.dal.ca
Sat Aug 9 02:40:44 CEST 2014


Dear Ross and others,

I noticed two more bugs that seem related to this one, but I think
they might be distinct bugs. My version is 3.1415926-2.5-1.40.14 from
TeX Live 2013/Debian.

BUG 1. Missing newline after stream data and before "endstream"
===============================================================

This bug is a PDF/A syntax error in stream objects generated by
\pdfobj.

PDF permits an end-of-line marker between the end of the stream data
and the endstream keyword. See e.g. Section 3.2.7 of the PDF
Reference, version 1.6: "It is recommended that there be an
end-of-line marker after the data and before endstream; this marker is
not included in the stream length."

In usual PDF, this end-of-line marker is optional; however, in PDF/A,
it is mandatory.

pdftex does not always produce this required end-of-line
marker. Specifically, the newline is omitted if the stream is not
compressed, and if the stream data itself ends with a newline.  This
triggers an error during PDF/A-1b validation. The behavior is
succinctly illustrated with this example document:

----------------------------------------------------------------------
\documentclass{article}
\pdfcompresslevel=0
\begin{document}
x
\immediate\pdfobj stream attr {/Type/Test} file {streamdata.txt}
\end{document}
----------------------------------------------------------------------

Here, the file streamdata.txt contains the 31 characters 
"This stream ends with a newline" followed by a newline character, for
a total file length of 32 bytes.

The output of pdflatex, as well as the relevant section of the report
from the Acrobat XI Pro Preflight tool, are here:

http://www.mathstat.dal.ca/~selinger/downloads/pdftex-bug1.pdf
http://www.mathstat.dal.ca/~selinger/downloads/audit1.png

Note that the generated PDF contains the following object:

1 0 obj <<
/Type/Test
/Length 32        
>>
stream
This stream ends with a newline
endstream
endobj

The /Length property is correctly set to 32, which includes the
newline that is the last byte of the stream data. However, the
PDF/A mandatory *additional* newline between the stream data and
"endstream" is missing, triggering a validation error. 

For the above simple file, the Preflight report naturally has many
other errors too; however, they are not relevant to this bug. The
relevant error is the one that occurs in object { 1 0 obj } and two
other objects.

To show that the error is indeed caused by the missing newline, and
not by something else, I manually added the newline (and took away a
space so as not to upset the xref table). The fixed file and report
are here:

http://www.mathstat.dal.ca/~selinger/downloads/pdftex-bug1-fixed.pdf
http://www.mathstat.dal.ca/~selinger/downloads/audit1-fixed.png

Note that the error in { 1 0 obj } has gone away.

Bug 2. Missing EOL markers before "endobj" and after "obj"
==========================================================

In the PDF/A standard, it is mandatory that there is a newline after
"obj" and before "endobj". However, pdftex produces many instances of
"obj <<" on a single line, and ">> endobj" on a single line. For
example, all stream objects are formatted with "obj <<", as the above
example shows. Pdftex also formats many other objects in this way,
including page objects, link annotations, font descriptors, the main
catalog and Info dictionaries. Regardless of what packages are used,
these objects are all formatted in this style:

100 0 obj <<
/Type /Page
/Contents 102 0 R
/Resources 101 0 R
/MediaBox [0 0 612 792]
/Parent 110 0 R
>> endobj

The correct PDF/A syntax would be:

100 0 obj
<<
/Type /Page
/Contents 102 0 R
/Resources 101 0 R
/MediaBox [0 0 612 792]
/Parent 110 0 R
>>
endobj

Here's an example report showing the errors triggered by this syntax:
http://www.mathstat.dal.ca/~selinger/downloads/audit2.png

I confirm that the errors go away if one fixes the syntax manually. 
I did:

replace 'obj <<' $'obj\n<<' '>> endobj' $'>>\nendobj' -- S.pdf

and the resulting report has no errors:
http://www.mathstat.dal.ca/~selinger/downloads/audit2-fixed.png

-- Peter

Ross Moore wrote:
> Hi Karl, Thanh, and others.
> 
> I'm encountering a very subtle bug in pdftex v.1.40.15
> from TeXLive 2014.
> This seems to be the latest version available at  supelec.fr .
> 
> 
> The attached source and image file generated a PDF that
> exhibits the bug, which only shows up when you try to 
> validate for recent stricter PDF versions, such as PDF/A-3.
> 
> It is exhibited as follows, whereby an object is put into the
> PDF as  (near lines 3757-3763 of  graphic-test.pdf ):
> 
> >> 10 0 obj
> >> <<
> >> /I false
> >> /K false
> >> /CS /DeviceRGB
> >> /S /Transparency
> >> >>endobj
> 
> 
> Note the lack of endline character before the 'endobj' keyword.
> 
> This must be due to pdftex, since in the image file
>     Figure3-Asana-ai.pdf
> the corresponding object appears as:
> 
> >> 636 0 obj<</I false/K false/CS/DeviceRGB/S/Transparency>>
> >> endobj


More information about the pdftex mailing list