[pdftex] Redundant objects: patch available

Otfried Cheong otfried at cs.uu.nl
Fri May 4 14:49:49 CEST 2001


Andreas writes:
 > Hey, this is marvelous. I have just compiled it and did a few
 > tests. As one example I included the whole Adobe PDF Reference:
 > 
 > \documentclass[a4paper]{article}
 > \usepackage{pdfpages}
 > \begin{document}
 > \includepdf[pages=1-696]{PDFRef.pdf}
 > \end{document}

Perhaps not so surprisingly, this is one of the exact tests I did
myself :-)

 > The file (test-14hpatch.pdf) created with the Ofrieds patch is even
 > smaller than the pdf file from Adobe. Seems like Adobe has some
 > duplicated resources in their files. Now pdftex is first :-)

Before you jump to conclusions about Adobe's PDF generators:  If
PDFRef.pdf contains duplicated resources, those wouldn't be merged by
pdftex.  My patch stops resources from being embedded more than once,
it does not actively search for things to merge...

There are several possible reasons why test-14hpatch.pdf is nearly a
megabyte smaller than PDFRef.pdf.  Remember, objects are only copied
if they are referenced (directly or indirectly) from a page of the
document.  

  (1) There could be unused objects.

  (2) Document outlines, thumbnails, threads and named destinations
      are not copied. (PDFRef.pdf contains lots of names links, one
      per page, figure, table, etc., occupying a total of about
      400kB.)

  (3) Known Type1 fonts are embedded by pdftex itself, so if the
      document contains extra resources for these, they are not
      copied.  This explains why the "ToUnicode" resource of these
      fonts is lost.

  (4) The hint tables of linearized PDF are lost (and the output is of
      course not linearized). 

  (5) Framemaker might embed additional information for its own use.

  (6) There could be a bug in the copying code :-)

I still don't know exactly where the missing megabyte went.  I think
I'll have to write a small tool to compute PDF statistics (how many
bytes in what kind of data).

Otfried





More information about the pdftex mailing list