[XeTeX] Producer entry in info dict

Wed Feb 29 01:31:39 CET 2012

On Wed, Feb 29, 2012 at 09:57:04AM +1100, Ross Moore wrote:

> On 29/02/2012, at 8:44 AM, Heiko Oberdiek wrote:
>
> > Hello,
> >
> > the entries in the information dictionary can be controlled
> > at TeX macro level except for /Producer:
> >
> > % xetex --ini
> > \catcode\{=1
> > \catcode\}=2
> > \shipout\hbox{%
> >  \special{pdf:docinfo<<%
> >    /Producer(MyProducer)%
> >    /Creator(MyCreator)%
> >    /Author(MyAuthor)%
> >    /Title(MyTitle)%
> >    /Subject(MySubject)%
> >    /Keywords(MyKeywords)%
> >    /CreationDate(D:20120101000000Z)%
> >    /ModDate(D:20120101000000Z)%
> >    /MyKey(MyValue)%
> >>> }%
> > }
> > \csname @@end\endcsname\end
>
> Surely  /Creator  is (La)TeX, Xe(La)TeX, ConTeXt, etc.
> while   /Producer  is the PDF engine:
>    Ghostscript, xdvipdfmx, pstopdf, Acrobat Distiller, etc.
> and  /Author  is the person who wrote the bulk of
> the document source.
>
> Why should it be reasonable that an author can set the
>  /Producer and /Creator  arbitrarily within the document
> source?

What's wrong with meantioning "XeTeX", for example?

Another reason: PDF/A requires that the information
date are duplicated in the XMP part. If a package
like hyperxmp tries to write the data as XMP, then
it has no chance to know the /Producer value.

> The author chooses his workflow, and should pass this
> information on to the appropriate package ...

He can't.

> > The entry for /Producer gets overwritten by xdvipdfmx,
> > e.g. "xdvipdfmx (0.7.8)". Result:
> >
> > * Bug-reports/hyperref: pdfproducer={XeTeX ...} does not work.
> > * hyperxmp is at a loss, it *MUST* know the value of the
> >  /Producer, because the setting in the XMP part has to be
> >  the same.
>
>   ... via options to  \usepackage[...]{hyperxmp}
>
> and the package should be kept up-to-date with the exact strings
> that will be produced by the different processing engines, in all
> their existing versions.

That's needs implementing clearvoyance. The (x)dvipdfm(x) driver
is running *AFTER* the TeX processing part.

> I know that one processor cannot know in advance how its output
> will be further processed, but that is not the point of XMP.

Then the (x)dvipdfm(x) driver must also fix the producer entry
in the XMP part. Changing it only in the information dictionary
violates a requirement for PDF/A.

> The person who is the author, or production editor, *does* know
> this information (at least in principle) and should ensure that
> this gets encoded properly within the final PDF

Yes, there are users that detect that the intended producer entry
is *NOT* the one that gets written in the final PDF and writing
bug reports.

> --- if complete
> validation against an existing standard is of any importance.

Violation against PDF/A, for instance.

> > Please fix this issue in xdvipdfmx.
>
> I'm not sure that it is  xdvipdfmx's duty to handle this
> issue; though see my final words below.
>
> My initial thoughts are as follows:
>
> The nature and purpose of XMP  is such that an author
> cannot just  \usepackage{hyperxmp}   with no extra options,
> and expect the XMP information to be created automagically,
> correctly in every detail.

There is no reason that the trivial stuff (pdf information
entries) should not be working.

> The alternative is to have an auxiliary file that contains
> macro definitions, to be used both in the  docinfo  and XMP.
> This auxiliary file needs to be created either manually,
> or automatically extracting the information from a PDF,
> first time it is created.

I don't see any reliable way.

Even if you say good bye to TeX and do the stuff by
a program that fixes the PDF file afterwords, there
is no warning or hint that the PDF file generated
by XeTeX is wrong.

You can't expect from a user that he knows how to get
the version information of xdvipdfmx. He calls XeTeX
and usually does not even know that he is indirectly
running xdvipdfmx. Also the producer strings change
from driver to driver and from version to version.
Also several instances of a driver can be installed:
There is no way to catch the difference between
xetex -output-driver=xdvipmdfx078
and
xetex -output-driver=xdvipdfmx100
at TeX macro level.

> With PDF/A and PDF/UA the PDF file is not supposed to be
> compressed,

s/PDF file/PDF information dictionary/;

With XeTeX you need a command line like:
xetex -outputdriver='xdvipdfmx -V4'

With pdfTeX/LuaTeX the necessary settings can be done at
macro level and therefore put in a package without bothering
the user with low level stuff.

> BTW, what about the  /CreationDate  and  /ModificationDate ?
> Surely these should be set automatically too ?

pdfTeX:
/CreationDate and /ModDate are set automatically
unless they are specified by the user.
(x)dvipdfmx:
/CreationDate is set automatically, /ModDate is not set
unless they are specified by the user (see test file).

The same would be nice for /Producer, easy to implement
and document.

> Doesn't  pdfTeX  have the means to do this?

Works fine. Without setting the date of the pdfTeX run
is used for /CreationDate and /ModDate. But both
can be overwritten:

\catcode\{=1
\catcode\}=2
\pdfoutput=1
\pdfobjcompresslevel=0
\pdfinfo{%
/CreationDate(D:20120101000000Z)%
/ModDate(D:20120202000000Z)%
}
\shipout\hbox{}
\csname @@end\endcsname\end

> Of course when it is a 2-engine process, such as
>   XeTeX + xdvipdfmx
> then which time should be encoded here?

And there are situations where the time needs to be known
at the TeX macro level.

> XeTeX cannot know the time at which  xdvipdfmx  will do
> its work.  Maybe it can extrapolate ahead, from information
> saved from the previous run ?

The current behaviour seems fine to me. If nothing is
specified, the /CreationDate is set automatically.
Otherwise, /CreationDate and/or /ModDate can be
set by \special{pdf:docinfo ...}.

> So maybe what is really desirable is for  xdvipdfmx  to write
> out an auxiliary file containing all relevant metadata, including
> timings, that can then be used by the next run of  XeLaTeX .
> A  \special{ ... }  command could be used to trigger the need
> for such an action to be performed.
>
> Is that what you had in mind?

I don't see a need to makes things complicate.
For example, if /Producer behaves like /CreationDate,
then the problem is solved.
Of course, such flexibility allows the user to lie
and putting "Word" or worse in the producer entry.
But if a user wants to lie, then it can be done anyway
(see Ross' PDF postprocessing suggestion). However,
without such flexibility legitimate usages of
producer settings like "XeTeX ..." cannot be set
in the TeX file. If xdvipdfmx wants to be in the
producer entry, then I can change the default in
hyperref to something like "XeTeX 0.9997/xdvipdfmx"
if XeTeX also provides the output driver name
(e.g. \XeTeXoutputdriver) that is known to XeTeX.

Yours sincerely
Heiko Oberdiek