[pdftex] Redundant objects: patch available
Otfried Cheong
otfried at cs.uu.nl
Fri May 4 14:49:49 CEST 2001
Andreas writes:
> Hey, this is marvelous. I have just compiled it and did a few
> tests. As one example I included the whole Adobe PDF Reference:
>
> \documentclass[a4paper]{article}
> \usepackage{pdfpages}
> \begin{document}
> \includepdf[pages=1-696]{PDFRef.pdf}
> \end{document}
Perhaps not so surprisingly, this is one of the exact tests I did
myself :-)
> The file (test-14hpatch.pdf) created with the Ofrieds patch is even
> smaller than the pdf file from Adobe. Seems like Adobe has some
> duplicated resources in their files. Now pdftex is first :-)
Before you jump to conclusions about Adobe's PDF generators: If
PDFRef.pdf contains duplicated resources, those wouldn't be merged by
pdftex. My patch stops resources from being embedded more than once,
it does not actively search for things to merge...
There are several possible reasons why test-14hpatch.pdf is nearly a
megabyte smaller than PDFRef.pdf. Remember, objects are only copied
if they are referenced (directly or indirectly) from a page of the
document.
(1) There could be unused objects.
(2) Document outlines, thumbnails, threads and named destinations
are not copied. (PDFRef.pdf contains lots of names links, one
per page, figure, table, etc., occupying a total of about
400kB.)
(3) Known Type1 fonts are embedded by pdftex itself, so if the
document contains extra resources for these, they are not
copied. This explains why the "ToUnicode" resource of these
fonts is lost.
(4) The hint tables of linearized PDF are lost (and the output is of
course not linearized).
(5) Framemaker might embed additional information for its own use.
(6) There could be a bug in the copying code :-)
I still don't know exactly where the missing megabyte went. I think
I'll have to write a small tool to compute PDF statistics (how many
bytes in what kind of data).
Otfried
More information about the pdftex
mailing list