[XeTeX] \(pdf)mdfivesum

Ross Moore ross.moore at mq.edu.au
Wed Jul 1 22:25:32 CEST 2015


Hi Joseph,

On 01/07/2015, at 23:03, Joseph Wright <joseph.wright at morningstar2.co.uk> wrote:

> Hello all,
> 
> I have a request for a new primitive in XeTeX, not directly related to
> typesetting by I think useful. To understand why I'm asking, a bit of
> background would be useful.
> 
> The LaTeX team have recently taken over looking after catcode/charcode
> info for the Unicode engines from the previous rather diffuse situation.
> As part of that, we were asked to ensure that the derived data was
> traceable and so have included the MD5 sum of the source files in the
> new unicode-letters.def file.

MD5 sums are also required pieces of data with some of the modern PDF standards, such as PDF/A, PDF/UA, and especially whenever attachments are included.
They are part of the bookkeeping data that can be used to ensure that embedded files are indeed  what was intended, and have not been intercepted and changed by Malware.

> We can happily generate that file using pdfTeX (\pdfmdfivesum primitive)
> or LuaTeX (using Lua code), but not using XeTeX. That's not a big issue
> but the need for an MD5 sum gives me an idea which would need support in
> XeTeX.
> 
> LaTeX offers \listfiles to help us track down package version issues but
> this fails if files have been locally modified or don't have
> date/version info. It would therefore be useful to have a system that
> can ensure that files match, which is where MD5 sums come in. Once can
> imagine arranging that every file \input (or \read) has the MD5 sum
> calculated as part of document typesetting: this is not LaTeX-specific.
> This data could then be available as an additional file listing to help
> track problems. However, to be truly useful this would need to work with
> all three major engines, and currently XeTeX is out. I'd therefore like
> to ask that \pdfmdfivesum (or perhaps just \mdfivesum) is added to XeTeX.

I fully support this request.
Issues of guaranteeing fidelity and conformance to standards are actually quite important in areas other than academia.
It is time TeX caught up with regard to such issues.


> There are a small number of other 'utility' primitives in pdfTeX/LuaTeX
> (some in the latter as Lua code emulation) that might also be looked at
> at the same time (see
> http://chat.stackexchange.com/transcript/message/22496265#22496265):
> 
> - \pdfcreationdate
> - \pdfescapestring
> - \pdfescapename
> - \pdfescapehex
> - \pdfunescapehex
> - \pdfuniformdeviate
> - \pdfnormaldeviate
> - \pdffilemoddate
> - \pdffilesize
> - \pdffiledump
> - \pdfrandomseed
> - \pdfsetrandomseed

Several of these are definitely needed when generating PDFs that conform to existing standards, particularly with regard to attached or embedded files.

- \pdffilemoddate
- \pdfcreationdate
- \pdffilesize

Of course it is not hard to get such information from command-line utilities, when the files to be included are pre-existing, prior to commencement of a typesetting job.
But in cases where TeX is used to itself write out the files before re-reading for inclusion, then it is much easier to code when such primitives are available within the engine. Otherwise one needs to encode a call-out to command-line utilities, then read back the output. This introduces OS system dependencies, which is something that we definitely want to avoid with TeX systems.

> 
> most of which are not related to PDF output and which may have good use
> cases. I am specifically *not* asking for any of these to be added here
> but note this list as it *may* be that the work may be closely related.
> --
> Joseph Wright

Hope this helps,

    Ross




More information about the XeTeX mailing list