[pdftex] please make PDF ID field more deterministic

Nicolas Boulenguez nicolas at debian.org
Sun May 17 21:09:45 CEST 2015


Hello.

It would be convenient that, by default or with appropriate options,
PDFTeX produces reproducible results from given input files.  You may
find some motivations at
https://wiki.debian.org/ReproducibleBuilds/About#Why_do_we_want_reproducible_builds.3F
Despite the wording, few problems are specific to Debian.

By default, the CreationDate and ModDate fields reflect the build
date, but this can be overriden with
  \ifpdf\pdfinfo{/CreationDate($DATE)/ModDate($DATE)}\fi
or
  pdftex '\pdfinfo{/CreationDate($DATE)/ModDate($DATE)}\input{source.tex}'

Unfortunately, the implementation of the ID field is an MD5 hash of
the build date, the output directory path and the output file name.

A first suggestion is to avoid generating the ID field, which is
optional and not widely used.

A less intrusive option would provide a new primitive like
\pdfsetoutputfileid{}.

My favorite suggestion would let the default ID depend on
- the configurable CreationDate instead of a random gmtime().
- the output file name only, ignoring its directory path.
Even if everybody starts to use the ID field, I hardly imagine how
these changes could create a collision.

The attached patch demonstrates the idea on
  https://foundry.supelec.fr/scm/viewvc.php/*checkout*/trunk/source/src/texk/web2c/pdftexdir/utils.c?revision=463&root=pdftex
It requires a trivial, but less readable change, in order to actually
compile: swap the declarations of the start_time_str global variable
and the printID() procedure.

--
Please CC me, I do not read this list permanently.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: utils.c.diff
Type: text/x-diff
Size: 1113 bytes
Desc: not available
URL: <http://tug.org/pipermail/pdftex/attachments/20150517/116bf598/attachment.bin>


More information about the pdftex mailing list