[pdftex] pdftex deterministic?

George N. White III gnwiii at gmail.com
Sat Mar 31 19:04:48 CEST 2007


On 3/31/07, Geoffrey Alan Washburn <geoffw at cis.upenn.edu> wrote:

>         I could swear I had read something about this in the past, but I
> couldn't remember the correct keywords to find anything via search.  In
> any event, I recently wanted to make some changes to a the source of a
> document and to make sure that these changes did not actually affect the
> document I tried diffing the before and after PDFs.  Unfortunately,
> after some further experimentation it does not seem that even repeated
> runs of the same document produce identical output.  Is there any way I
> can modify my documents or the parameters to pdftex to produce identical
> output on identical inputs?  I realize this very well may not be
> possible, and if so, what alternatives do people use in practice?  Thanks!

Acroabt has several types of side-by-side comparisons. There are other
commercial tools, (one advantage of using .pdf format is that there
are lots of people using it, so you can draw on general-purpose tools
from outside the TeX community) but I have no experience with them:
            <http://www.zizasoft.com>, <http://www.docucomp.com/>

Did you try "diff --text" (e.g., with diff from GNU diffutils 2.8.1)?

Dates and some generated "ID" are stored in the pdf file so a plain
"diff" always says: "Binary files 1/foo.pdf and 2/foo.pdf differ".
If you use "diff --text 1/foo.pdf and 2/foo.pdf" you should get
something like:

405,406c405,406
< /CreationDate (D:20070324132126-03'00')
< /ModDate (D:20070324132126-03'00')
---
> /CreationDate (D:20070331131842-03'00')
> /ModDate (D:20070331131842-03'00')
436c436
< /ID [<E1671B8E332FB4F759BF968FAE32724A>
<E1671B8E332FB4F759BF968FAE32724A>] >>---
> /ID [<FB02C8CC764462DED0414047DF118FBC> <FB02C8CC764462DED0414047DF118FBC>] >>

pdftool (from Artifex fitz <http://ccxvii.net/apparition/>) can
extract individual objects for analysis after diff identifies a
problem.  Another approach is to compare rasterized pages using image
differences.

It is worth the effort to get pdftool (and apparition works well
machines that get bogged down by acroread).   I don't know if any
linux distro has binaries, but everything but the jbig2dec library is
widely available in linux package form.

  -------------- (from the README) ---------------
PREREQUISITES

 Before compiling Fitz you need to install thirdy party dependencies.

   zlib
   libjpeg
   libpng
   freetype2
   expat

 There are a few optional dependencies that you don't strictly need.
 You will probably want the versions that Ghostscript maintains.

    jbig2dec
    jasper

 Fitz uses the Perforce Jam build tool. You need the Perforce version 2.5
 or later. Earlier versions (including the FTJam fork) have crippling bugs.
 Boost Jam is not backwards compatible. If you do not have a compiled
 binary for your system, you can find the Jam homepage here:
   <http://www.perforce.com/jam/jam.html>
              -------------------------------------------------------------------------------

-- 
George N. White III <aa056 at chebucto.ns.ca>
Head of St. Margarets Bay, Nova Scotia


More information about the pdftex mailing list