[pdftex] Redundant objects: patch available

Han The Thanh thanh at informatics.muni.cz
Fri May 4 14:56:46 CEST 2001


>  > The file (test-14hpatch.pdf) created with the Ofrieds patch is even
>  > smaller than the pdf file from Adobe. Seems like Adobe has some
>  > duplicated resources in their files. Now pdftex is first :-)
> 
> Before you jump to conclusions about Adobe's PDF generators:  If
> PDFRef.pdf contains duplicated resources, those wouldn't be merged by
> pdftex.  My patch stops resources from being embedded more than once,
> it does not actively search for things to merge...
> 
> There are several possible reasons why test-14hpatch.pdf is nearly a
> megabyte smaller than PDFRef.pdf.  Remember, objects are only copied
> if they are referenced (directly or indirectly) from a page of the
> document.  
> 
>   (1) There could be unused objects.
> 
>   (2) Document outlines, thumbnails, threads and named destinations
>       are not copied. (PDFRef.pdf contains lots of names links, one
>       per page, figure, table, etc., occupying a total of about
>       400kB.)
> 
>   (3) Known Type1 fonts are embedded by pdftex itself, so if the
>       document contains extra resources for these, they are not
>       copied.  This explains why the "ToUnicode" resource of these
>       fonts is lost.
> 
>   (4) The hint tables of linearized PDF are lost (and the output is of
>       course not linearized). 
> 
>   (5) Framemaker might embed additional information for its own use.
> 
>   (6) There could be a bug in the copying code :-)
> 
> I still don't know exactly where the missing megabyte went.  I think
> I'll have to write a small tool to compute PDF statistics (how many
> bytes in what kind of data).

the pdf spec is quite a `rich' pdf with a lot of outlines, annotations and
the likes. I think the amount these elements occupy can be quite a lot
(400KB seems too small).

Regards,
Thanh



More information about the pdftex mailing list