[pdftex] pdftex deterministic?

Tue Apr 3 15:43:50 CEST 2007

On Tue, 2007-04-03 at 14:25 +0200, Hans Hagen wrote:
> Matteo Centonza wrote:
> > On Sun, 2007-04-01 at 21:46 +0200, Thanh Han The wrote:
> >   
> >> I have made a script to compare 2 pdfs, which I use to
> >> quickly test whether a new version of pdftex produces
> >> different output from the previous version. The script use
> >> gs or pdftoppm to generate all pages of the pdfs as bitmaps,
> >> then compare them page by page using diff. If they are
> >> different, they are further compared by the 'compare' tool
> >> from image magick. If the difference is larger than a
> >> threshold, an image showing the differences is shown.
> >>
> >> If anyone is interested in the script I will send it.
> >>     
> >
> > Hi Thahn,
> >
> > i suppose this could be a very time-consuming operation
> > when you try to compare huge PDFs. In my experience,
> > even a smart diff (diffing two files excluding known
> > differencies induced by e.g. the different filename, different IDs etc.)
> > could be many times longer than having a match on checksums
> > (by a factor that could be > 20).
> >   
> you can also use pdftotext and then compare the text files
> > Using checksums is the cheapest way of comparing two files
> > (even cheaper than a byte comparison) and give you absolute
> > confidence on the result.
> >
> > The only drawback is that you have to slightly modify the driver
> > in order to make this possible (e.g. fixing IDs differences)
> > but this could be an explicit option given to pdftex to let
> > it strip runtime info and focalize on the content. For
> > final production files you have simply to rerun pdftex
> > with this options turned off.
> >   
> you can operate on copies of the pdf where you remove the problematic data
> > I ask you this, because in my environment i need to compare
> > the source LaTeX document with that produced by my production system
> > and since i need the fastest comparison, i require (by now) ps checksum
> > equivalence in order to sort out regressions.
> >   
> > I surely need to do the same with PDFs once i move using pdftex
> > (and i'll highly prefer to have an official pdftex option ;))
> >
> > Is this feasible or clashes with any PDF specs?
> >   
> It clashes with the objecties of the pdftex project to produce valid pdf files any time. 
> 
> Adding options to generate faulty non conforming pdf will only confuse users, put an extra burdon on development, manual writing, maintaince etc and once we start walking that road there will be no end. 
> 
> Of course, for your purpose you can decide to use a patched version of pdftex.  

thanks for the response, Hans.

I'll surely go this route, since for me any other approach,
for timing or integrity requirements, is a no-op.

If i ever produce a conformant PDF, i'll try to post a patch with my
mods and see if there's interest in it.

-m