[XeTeX] xdvipdfmx, page subsets, pgf, transparency

Mon May 16 15:55:17 CEST 2011

On Mon, May 16, 2011 at 07:57:14AM -0500, mskala at ansuz.sooke.bc.ca wrote:

> On Mon, 16 May 2011, Heiko Oberdiek wrote:
> > take some actions arising specials. But these actions might
> > have effects on the whole document. Sometimes this is good
> > to avoid missing object declarations and other things. But sometimes
> > extra stuff or wrong stuff is added because of pages that the user
> > has explicitly excluded.
> 
> Thanks for investigating this.  It's a complicated and pretty obscure
> problem that would only affect a few users, so I appreciate your paying
> attention.
> 
> I excluded the *page*.  I didn't exclude the *special*.

Not the special is part of the page. Thus you are excluding the
page with the special.

> I think it's a bug because A.

It's a bug in the documentation. Unhappily quite many dvi drivers
that allow the selection of page don't tell that this might not
work if specials are involved.

> the software detects that it is writing incorrect output,
> and B. it gives me no way to get correct output.  There's no way to
> include the special except by including the page.

The problem is that the special does not tell if it belongs to
the page only, to this and the following pages or the whole
document. This does not matter, if the whole document is set.
The contents of the special is arbitrary user stuff. Therefore
it is not possible for the DVI/XDV program to classify the
specials in general.

> > > that won't meet your needs instead!" responses:  The small PDFs will be
> > > used with the pdfpages package in a second run of XeLaTeX to generate PDFs
> > > for printing the large document as a multi-volume set of books.
> >
> > The purpose of the second step is not clear to me:
> > The final result consists of several PDF files, one for each book?
> > And the first XDV file is just the contents of the books and perhaps
> > some pages are reused for each book, thus that the second run
> > is only used for putting pages together?
> 
> The second run of XeTeX also adds some additional graphics, most notably
> thumb-tabs along the sides of the pages, and it changes the page size to
> accomodate that, so that the tabs will bleed all the way off the paper
> edge when printed and trimmed.  It's not purely a page-subsetting
> operation.  And I'd rather not make it purely a page-subsetting operation
> (by adding the thumb-tabs and changed size to the original large file)
> because that would mean generating two large files (one with thumb-tabs
> and one without; I also have a use for the large file in its current form)
> and the generation of the large file can't be parallelized (it's three
> long XeTeX runs that must be done sequentially on a single CPU).  The way
> I'm currently doing it means much of the work can be performed
> simultaneously on multiple CPUs and I can do at least some of the testing
> on just a single volume without having to generate the whole thing every
> time.
> 
> > Then I would suggest writing a program that deals with the XDV file:
> > a) splitting the single master .xdv file into the book .xdv files
> > b) analyzing the specials to add the missing ones to the .xdv file.
> 
> I'll do this if forced to, but since xdvipdfm is documented as being able
> to generate a page subset, I'd like to use it for its documented behaviour.

The documentation can be fixed. Nevertheless it will not help you.

> > If I have understood the second XeTeX run, then this step wouldn't
> > be necessary, saving you much time.
> 
> Unfortunately, I don't think it'll be trivial to eliminate the recombining
> step, because it does more than pasting the small PDFs together.

It would habe been a bonus, if the result of the tool
would have get the result of the final recombination step.
But that's not mandatory. The tool only must
write a page range of a .xdv file into a new .xdv file with
the selected page range and specials fixed.

> I'll
> continue playing with it, though; there may be a way to either eliminate
> the pasting step or use something faster than the pdfpages package.
> Right now the pdfpages package seems to be the real trouble spot for
> speed.

I don't think, pdfpages is the problem. You can verify it by using
the low level stuff for including a PDF page. At low level you can
only specify a pdf file and a page. Thus perhaps the whole pdf file
is read again and again for each included page. There is no interface
to specify and open a pdf file and to reference pages from it
without rereading the file.

> If there is something similar to it that could take pages from an
> XDVI file then I could eliminate the intermediate run of xdvipdfm, but
> then I'd still need to either do the page subsetting outside of XeTeX, or
> have the thing that pastes together the XDVI files also do page subsetting
> and not have the same problem with specials that xdvipdfm has.
> 
> I proceeded with the "dummy image on an early page" workaround and that
> seems to work pretty well.

I wouldn't rely on it. In case of transparency the dummy graphics
should include all used opacity values. See the example that I have
given in the previous post. If one value is missing you end up with
an invalid PDF file. And this error is *not* detected by xdvipdfmx.

> I put on an early page a TikZ image consisting
> of a semi-transparent white circle, which is invisible against the white
> page background.  That causes the "pgfopacities" object to be emitted for
> that page.

But the object will probably filled with data. The current pgf version
of TL2010 uses the object "pgfextgs" and each used opacity value
adds an entry, e.g. "opacity=0.5" adds "/pgf at CA0.5 << /CA 0.5 >>"
to this object. "pgf at CA0.5" is then referenced in the page stream.
If it is missing, then the PDF viewer might complain or silently use
a default value instead or ...

> Then I added that page to all the subsets generated with
> xdvipdfm, and had all my invocations of pdfpages include pages starting
> with the second page of each subset, instead of all pages.

Also you should check the other used specials as well to avoid
surprises.

> One small gotcha is that there apparently is a separate "pgfopacities"
> object generated for each distinct numerical value of opacity.  My actual
> images used "opacity=0.4" and the dummy image had to use that value too -
> an earlier attempt in which the dummy image used "opacity=0.1" generated a
> separate object that wasn't helpful in eliminating the error.

Yes. you have detected the problem already.

> It seems
> ridiculous to me that a single small number is a thing that must be
> created as a callable object and referred to by a name much longer than
> the number itself, instead of being included literally when it's used - I
> can't see how this indirection saves any time or space - but of course
> that's a PGF issue, nothing to do with xdvipdfm.

No, that's a PDF issue, the PDF specification requires it.

Yours sincerely
  Heiko Oberdiek