[XeTeX] xdvipdfmx, page subsets, pgf, transparency

Sun May 15 23:49:36 CEST 2011

I have a large document, approximately 1800 pages, with a lot of graphics
created by PGF/TikX and heavily using transparency.  I would like to use
XeTeX to generate one large XDVI file, then run xdvipdfmx several times
with different -s options to make PDFs of subranges of the pages.  (See
appendix below for *why* I want to do this in this particular way.)

The problem is that when I run xdvipdfmx on less than the entire document,
I get this message:

** WARNING ** Unresolved object reference "pgfopacities" found!!!

and all the transparent elements in the pictures come out as opaque in the
output.  It works correctly when I run xdvipdfmx on the *entire* file.

It appears that PGF generates some kind of PostScript code in a special in
the XDVI file, on the page that contains the first use of transparency
in the entire document.  If that page is included in the subset, xdvipdfmx
passes it through and is happy.  If that page is not included in the
subset, then xdvipdfmx doesn't know the special must be preserved, and so
it doesn't pass the special through but does pass through other
things that depend on the special, and it detects an error.

I have the beginnings of a work-around:  if I can figure out which page in
the large document is the first one to use PGF's transparency features, I
can just make sure that that page is included in the subset I request from
xdvipdfmx.  I've tested this manually and it looks like it should work.
Transparency is first used on page 47, and no subset that includes page 47
produces the error.  But I'd like it to be more automated. I may be able
to insert an invisible dummy image that uses transparency on an early page
of the large document, so that "the first use of transparency" will always
be on a predictable page number; that should help.

Since I'll be subsetting the pages again with pdfpages later, it should be
no problem to have an extra page in each small PDF.  A similar issue might
arise at the pdfpages stage, but I guess I can cross that bridge when I
come to it.

However, I think this is a bug in xdvipdfmx.  It is clearly smart enough
to know that it has emitted an object that requires "pgfopacities"; and it
is smart enough to recognize the definition of "pgfopacities" when it sees
that; if those two things were not both true, then it wouldn't be able to
detect the error condition.  So it should be able to automatically emit
the definition of pgfopacities when it is required.  If xdvipdfmx has all
the pieces to write correct output, and can detect that the output is
incorrect, but it writes incorrect output anyway, then I don't think pgf,
the user, or any other part of the system can be blamed.

I recognize that trying to handle this correctly might break the
single-pass nature of xdvipdfmx, if xdvipdfmx is trying to be single-pass
- because when it first sees the definition of "pgfopacities" on a page
that isn't being emitted, it doesn't know whether that definition will be
needed by some future page.  Dealing with all such things in general would
require xdvipdfmx to save all definitions on the side and then emit them
as and if they are needed, and that means storing a lot of possibly never
used data during processing.  But one way to work around it without
imposing a lot of extra record-keeping would be for xdvipdfmx to accept an
option listing the names of objects that *must* be passed through; then as
a user I could just specify that option for "pgfopacities" and get output
that works.  At present, the only way I can have it write out a given
special seems to be to request a page I don't really want that
happens to contain that special.
-- 
Matthew Skala
mskala at ansuz.sooke.bc.ca                 People before principles.
http://ansuz.sooke.bc.ca/

Appendix:

This is not directly relevant to the problem, but both for interested
readers and to forestall "You shouldn't want to do that, do something else
that won't meet your needs instead!" responses:  The small PDFs will be
used with the pdfpages package in a second run of XeLaTeX to generate PDFs
for printing the large document as a multi-volume set of books.  It's
preferable to use many small intermediate PDFs instead of using pdfpages
to take pages directly from the original large document, because pdfpages
seems to take a time *per page* proportional to the size of the document
it is examining; thus the overall process is quadratic without splitting
into small PDFs, and linear with splitting.  It's a difference of several
hours processing time at the moment and likely to become a difference of a
few days per run at some point in the future, because the large document
is planned to grow to three or four times its current size (i.e. 5000 to
7000 pages) within the scope of the project that it's part of.