[pdftex] Merging duplicate embedded fonts

Tue Oct 8 21:58:47 CEST 2013

Hello Maarten,

On 07/10/2013, at 19:53, Maarten Bezemer <m.m.bezemer at utwente.nl> wrote:

> Hello,
> 
> I have a problem with pdflatex asked about at 
> http://tex.stackexchange.com/questions/136574/merging-duplicate-embedded-fonts
> 
> In the end I got suggested to further ask over here.
> 
> I have a LaTeX project that contains 2 PDF images. Both have a text with the 
> same font that is fully embedded (not subset, to keep things simple).
> The PDF that is created from the LaTeX sources does contain the font twice, 
> once for each included image. I suppose that pdflatex should keep only one 
> copy, especially since both fonts are fully embedded so it is easy to 
> determine they are duplicate.

My understanding is that pdfTeX does not try to parse the internal structure of embedded files, whether they be in PDF or other format. It just assumes that they will work within the context in which you are embedding them, and takes no further responsibility apart from including them straight, within an appropriate PDF XObject wrapper.

How can the software know that the font included within each image is indeed the same?
Even if named similarly, and occupying the same number of bits, this doesn't preclude the internal structure being different. This could well be the case for two different subsets of the same  base font, used in images each having just small amounts of text. But pdfTeX doesn't even look for what fonts are in the image, so no comparison will ever be made.

> 
> My LaTeX file is as follows:
> 
> \documentclass{article}
> \usepackage{graphicx}
> 
> \begin{document}
> \includegraphics{image1}
> \includegraphics{image2}
> \end{document}
> 
> 
> pdffonts shows:
> $ pdffonts mydoc.pdf 
> name                                 type              encoding         emb 
> sub uni object ID
> ------------------------------------ ----------------- ---------------- --- 
> --- --- ---------
> SDXKYB+CMR10                         Type 1            Builtin          yes 
> yes no       6  0
> DejaVuSans                           TrueType          WinAnsi          yes no  
> yes     11  0
> DejaVuSans                           TrueType          WinAnsi          yes no  
> yes     17  0
> 
> In my original document I have lots of images containing texts, resulting in 
> lots of duplicate fonts. Obviously, I normally use subsets reducing the size 
> of the final document. But I am (also) not able to merge the duplicate 
> subsets... So I thought to use fully embedded fonts, but those do also not 
> properly merge...
> 
> Am I doing something wrong resulting in the duplicate fonts? Or did I 
> encounter a bug in pdf(la)tex?

pdfTeX is doing nothing wrong.
If you want to combine the fonts for each image, then those images cannot be considered as independent objects within the PDF. Suppose you want to isolate and extract an image from the final PDF? The reader software will have to be smarter than just extracting a simple XObject. It will need to build a new PDF on the fly, containing all the required fonts, appropriately referenced.  Some PDF readers may be able to do this, but others will not.

Presumably you want a size-reduction in your final PDF. This can only come as a compromise in the functionality according to the browser used by your audience, and/or at the expense of extra processing when the full document is created.

Ghostscript has been suggested already.
Or try using Acrobat Pro to save a "reduced size" PDF, as an extra step after pdfTeX.
Whether this latter will work may depend upon the characteristics of the font; in particular whether it is known already to the software installation, and it's licensing conditions.

> 
> The resulting PDF  file is set online [1],

I'll give APro a try and get back to you with the results.

> as well all the source files [2]
> 
> Best regards,
>  Maarten
> 
> 
> [1]: https://dl.dropboxusercontent.com/u/9671810/mydoc.pdf
> [2]: https://dl.dropboxusercontent.com/u/9671810/mydoc.zip

Cheers,

     Ross