[XeTeX] xdvipdfmx, page subsets, pgf, transparency

Mon May 16 13:42:32 CEST 2011

On Sun, May 15, 2011 at 04:49:36PM -0500, mskala at ansuz.sooke.bc.ca wrote:

> I have a large document, approximately 1800 pages, with a lot of graphics
> created by PGF/TikX and heavily using transparency.  I would like to use
> XeTeX to generate one large XDVI file, then run xdvipdfmx several times
> with different -s options to make PDFs of subranges of the pages.  (See
> appendix below for *why* I want to do this in this particular way.)
> 
> The problem is that when I run xdvipdfmx on less than the entire document,
> I get this message:
> 
> ** WARNING ** Unresolved object reference "pgfopacities" found!!!
> 
> and all the transparent elements in the pictures come out as opaque in the
> output.  It works correctly when I run xdvipdfmx on the *entire* file.
> 
> It appears that PGF generates some kind of PostScript code in a special in
> the XDVI file, on the page that contains the first use of transparency
> in the entire document.  If that page is included in the subset, xdvipdfmx
> passes it through and is happy.  If that page is not included in the
> subset, then xdvipdfmx doesn't know the special must be preserved, and so
> it doesn't pass the special through but does pass through other
> things that depend on the special, and it detects an error.

Operations like page subsetting, page reordering, ... depend on
the use of specials.

In your case the graphics software (pgf, tikz or whatever)
uses an object (pgfopacities, the name is arbitrary) for transparency
(perhaps for collecting the opacity values) and reuses it later.

> I have the beginnings of a work-around:  if I can figure out which page in
> the large document is the first one to use PGF's transparency features, I
> can just make sure that that page is included in the subset I request from
> xdvipdfmx.  I've tested this manually and it looks like it should work.
> Transparency is first used on page 47, and no subset that includes page 47
> produces the error.

I have analyzed a simple example of PGF's transparency (current TL version):

\documentclass{article}
\usepackage{tikz}
\begin{document}
Hello World
\newpage
\begin{tikzpicture}[line width=1ex]
  \draw (0,0) -- (3,1);
  \filldraw [fill=red,draw opacity=0.5] (1,0) rectangle (2,1);
\end{tikzpicture}
\newpage
\begin{tikzpicture}[line width=1ex]
  \draw (0,0) -- (3,1);
  \filldraw [fill=red,draw opacity=0.3] (1,0) rectangle (2,1);
\end{tikzpicture}
\newpage
\begin{tikzpicture}[line width=1ex]
  \draw (0,0) -- (3,1);
  \filldraw [fill=red,draw opacity=0.3] (1,0) rectangle (2,1);
\end{tikzpicture}
\end{document}

Page 1: no transparency
Page 2: opacity = 0.5
Page 3: opacity = 0.3
Page 4: opacity = 0.5

After changing the id byte of the pre- and postamble of the .xdv file
I could use dvii to analyze the file, especially the specials
(numbered at the beginning for easier referencing):

2.1 s:[2/2]:: pdf: obj @pgfextgs <<>>
2.2 s:[2/2]:: pdf: put @resources << /ExtGState @pgfextgs >>
2.3 s:[2/2]:: pdf: put @pgfextgs << /pgf at CA0.5 << /CA 0.5 >> >>

3.1 s:[3/3]:: pdf: put @resources << /ExtGState @pgfextgs >>
3.2 s:[3/3]:: pdf: put @pgfextgs << /pgf at CA0.3 << /CA 0.3 >> >>

4.1 s:[4/4]:: pdf: put @resources << /ExtGState @pgfextgs >>

Special 2.1 is also needed for pages 3 and 4, otherwise
the object "pgfextgs" is not known. Then the software
collects the opacitiy values there. On page 4 the value
for 0.5 is already put in special 3.3 of page 3.
Thus you would need page 2 and 3 for every subset that
contains page 4.

> But I'd like it to be more automated. I may be able
> to insert an invisible dummy image that uses transparency on an early page
> of the large document, so that "the first use of transparency" will always
> be on a predictable page number; that should help.

Then you have to rewrite the used software to collect all
information in an auxiliary file and put these on the
first page in the next run. Or the last page, but you also need
two runs to know the last page.

> However, I think this is a bug in xdvipdfmx. It is clearly smart enough
> to know that it has emitted an object that requires "pgfopacities"; and it
> is smart enough to recognize the definition of "pgfopacities" when it sees
> that; if those two things were not both true, then it wouldn't be able to
> detect the error condition. So it should be able to automatically emit
> the definition of pgfopacities when it is required. If xdvipdfmx has all
> the pieces to write correct output, and can detect that the output is
> incorrect, but it writes incorrect output anyway, then I don't think pgf,
> the user, or any other part of the system can be blamed.

And probably neither xdvipdfmx. The user explicitly excludes pages
from output. But xdvipdfmx could analzye the excluded pages and
take some actions arising specials. But these actions might
have effects on the whole document. Sometimes this is good
to avoid missing object declarations and other things. But sometimes
extra stuff or wrong stuff is added because of pages that the user
has explicitly excluded.

> This is not directly relevant to the problem, but both for interested
> readers and to forestall "You shouldn't want to do that, do something else
> that won't meet your needs instead!" responses:  The small PDFs will be
> used with the pdfpages package in a second run of XeLaTeX to generate PDFs
> for printing the large document as a multi-volume set of books.

The purpose of the second step is not clear to me:
The final result consists of several PDF files, one for each book?
And the first XDV file is just the contents of the books and perhaps
some pages are reused for each book, thus that the second run
is only used for putting pages together?

> It's
> preferable to use many small intermediate PDFs instead of using pdfpages
> to take pages directly from the original large document, because pdfpages
> seems to take a time *per page* proportional to the size of the document
> it is examining; thus the overall process is quadratic without splitting
> into small PDFs, and linear with splitting.  It's a difference of several
> hours processing time at the moment and likely to become a difference of a
> few days per run at some point in the future, because the large document
> is planned to grow to three or four times its current size (i.e. 5000 to
> 7000 pages) within the scope of the project that it's part of.

Then I would suggest writing a program that deals with the XDV file:
a) splitting the single master .xdv file into the book .xdv files
b) analyzing the specials to add the missing ones to the .xdv file.

If I have understood the second XeTeX run, then this step wouldn't
be necessary, saving you much time.

Item a) is much easier for .dvi/.xdv files than for .tex, .ps, .pdf files,
because the .dvi/.xdv files are easy to parse. One difficulty could be
the additions of XDV to the DVI format, if they are not documented.
Then the specification must be guessed from the sources.
The DVI format is well documented by Knuth (see sources of dvitype, ...).

Item b) need some knowledge of PDF internals. At least the implementing
for a known limited set of specials should be not to difficult.
But supporting it for more general cases can increase the work to
the work of a large software project.
  For example, the set of specials for page 4 of the example should
consist of 2.1, 3.2, 4.1. In this case 2.2 and 3.1 do not harm,
because the values of the same dict key are the same. bit 3.2
adds an additional value in the dict that is not needed. In this
case it might be tolerable.

Yours sincerely
  Heiko Oberdiek