[pdftex] pdftex deterministic?

Thanh Han The hanthethanh at gmail.com
Sun Apr 1 21:46:12 CEST 2007


I have made a script to compare 2 pdfs, which I use to
quickly test whether a new version of pdftex produces
different output from the previous version. The script use
gs or pdftoppm to generate all pages of the pdfs as bitmaps,
then compare them page by page using diff. If they are
different, they are further compared by the 'compare' tool
from image magick. If the difference is larger than a
threshold, an image showing the differences is shown.

If anyone is interested in the script I will send it.

Thanh

On Sat, Mar 31, 2007 at 02:04:48PM -0300, George N. White III wrote:
> On 3/31/07, Geoffrey Alan Washburn <geoffw at cis.upenn.edu> wrote:
>
> >         I could swear I had read something about this in the past, but I
> > couldn't remember the correct keywords to find anything via search.  In
> > any event, I recently wanted to make some changes to a the source of a
> > document and to make sure that these changes did not actually affect the
> > document I tried diffing the before and after PDFs.  Unfortunately,
> > after some further experimentation it does not seem that even repeated
> > runs of the same document produce identical output.  Is there any way I
> > can modify my documents or the parameters to pdftex to produce identical
> > output on identical inputs?  I realize this very well may not be
> > possible, and if so, what alternatives do people use in practice?  Thanks!
>
> Acroabt has several types of side-by-side comparisons. There are other
> commercial tools, (one advantage of using .pdf format is that there
> are lots of people using it, so you can draw on general-purpose tools
> from outside the TeX community) but I have no experience with them:
>             <http://www.zizasoft.com>, <http://www.docucomp.com/>
>
> Did you try "diff --text" (e.g., with diff from GNU diffutils 2.8.1)?
>
> Dates and some generated "ID" are stored in the pdf file so a plain
> "diff" always says: "Binary files 1/foo.pdf and 2/foo.pdf differ".
> If you use "diff --text 1/foo.pdf and 2/foo.pdf" you should get
> something like:
>
> 405,406c405,406
> < /CreationDate (D:20070324132126-03'00')
> < /ModDate (D:20070324132126-03'00')
> ---
> > /CreationDate (D:20070331131842-03'00')
> > /ModDate (D:20070331131842-03'00')
> 436c436
> < /ID [<E1671B8E332FB4F759BF968FAE32724A>
> <E1671B8E332FB4F759BF968FAE32724A>] >>---
> > /ID [<FB02C8CC764462DED0414047DF118FBC> <FB02C8CC764462DED0414047DF118FBC>] >>
>
> pdftool (from Artifex fitz <http://ccxvii.net/apparition/>) can
> extract individual objects for analysis after diff identifies a
> problem.  Another approach is to compare rasterized pages using image
> differences.
>
> It is worth the effort to get pdftool (and apparition works well
> machines that get bogged down by acroread).   I don't know if any
> linux distro has binaries, but everything but the jbig2dec library is
> widely available in linux package form.
>
>   -------------- (from the README) ---------------
> PREREQUISITES
>
>  Before compiling Fitz you need to install thirdy party dependencies.
>
>    zlib
>    libjpeg
>    libpng
>    freetype2
>    expat
>
>  There are a few optional dependencies that you don't strictly need.
>  You will probably want the versions that Ghostscript maintains.
>
>     jbig2dec
>     jasper
>
>  Fitz uses the Perforce Jam build tool. You need the Perforce version 2.5
>  or later. Earlier versions (including the FTJam fork) have crippling bugs.
>  Boost Jam is not backwards compatible. If you do not have a compiled
>  binary for your system, you can find the Jam homepage here:
>    <http://www.perforce.com/jam/jam.html>
>               -------------------------------------------------------------------------------
>
> --
> George N. White III <aa056 at chebucto.ns.ca>
> Head of St. Margarets Bay, Nova Scotia
> _______________________________________________
> pdftex mailing list
> pdftex at tug.org
> http://tug.org/mailman/listinfo/pdftex


More information about the pdftex mailing list