[pdftex] redundant objects with includegraphics

Wed May 2 14:40:11 CEST 2001

Andreas Matthias amat at kabsi.at writes:
 > 
 > > So, if people agree that my suggested extension to \pdfximage is
 > > useful, and nobody more qualified takes it up, I will go and try to
 > > implement it in pdftex.  I guess I could do this in a day, while a
 > > postprocessor looks a bit more work :-)
 > 
 > This would be great. But IMHO this is just the first step. The
 > second would be to find someone to include your extensions
 > in `pdftex.def' (`graphicx.sty'), so that we don't lose the
 > wonderful things that \includegraphics can do (trim, clip, rotate,...).
 > How much effort would it require to do that? 

That is of course a good point.  It seems difficult for a macro
package to allow users to use the extension as I proposed it.  In any
case, 'pdftex.def' does not currently seem to have support for
including a selected page from a PDF file, and can only include
single-page documents (or the first page of multi-page documents).  So
at this moment, my proposal is irrelevant for 'graphicx.sty'.

However, with Thanh's encouragement, I've looked at the pdftex code a
bit more, and here is a refined proposal that would require no changes
in any macro package.  The idea is as follows: At the beginning of the
document, a user requests that a PDF file "a.pdf" is embedded by
saying "\pdfembed{a.pdf}".  This would create PDF objects for all the
pages of "a.pdf", and all their resources, without duplication.

In the following, users can then include pages from "a.pdf" normally,
using "\pdfximage page <n> {a.pdf}" (or "\includegraphics", once it
has support for page selection).  This would work fine, because
"\pdfximage" would know that "a.pdf" is already embedded and would
simply return a reference to the XObject.

This feature could be generalized a bit.

Currently, if one does this:

	Some text...
	\includegraphics{a.pdf}
	more text...
	\includegraphics{a.pdf}
	still some text...,

one ends up with TWO copies of the embedded PDF file "a.pdf".
Obviously one can avoid that using the \pdfrefximage feature, but I
don't think \includegraphics gives access to that.  Still one can
of course avoid the problem using \setbox and \usebox (but that
doesn't scale arbitrarily).

Now, pdftex already builds a table of all the included images, with
their filenames and page numbers.  We could easily modify the
semantics of "\pdfximage" to return the reference number of the
already existing image in its table, if the filename and page number
match (slight catch: the attr attributes of the image may not be
identical).  The effect would be that the code fragment above will
only embed "a.pdf" once, and reuse it as often as necessary.

With this changed semantics, "\pdfembed" is then trivial: It just
embeds all the pages and resources at once, and adds entries to the
figure table for each page.  Future attempts to load individual pages
from "a.pdf" using \pdfximage will simply reference the already
embedded pages.

Is this desirable?  Or is there a reason for the current design that
does not check the image table, and leaves object reuse to the user
using \pdfrefximage?  I cannot come up with a reason why one would
want to embed the same file multiple times, but perhaps my thinking
isn't twisted enough yet :-)

Otfried