[XeTeX] [arXiv #128410] Re: XeLaTeX generated pdf metadata

Tue Sep 23 09:18:04 CEST 2014

I have had similar problems with PubMedCentral.  While I was a Wellcome
Trust Senior Research Fellow, I was contractually obliged to submit all my
publications to PMC.  But in every case it took over a year for my work to
appear, and involved a huge wrangle about XML, XeTeX, and conversion.

PMC uses tools to convert the author's PDF into XML.  Then they generate a
new PDF from their XML.  They publish their own XML and their own PDF.

I get it that they want XML.  But their conversion pipeline is not good for
complex work, especially if it includes Unicode characters.  Their
re-generated PDFs were a complete mess and my articles were quite literally
unreadable.  (And the page numbers were all changed, making reference
ambiguous.)  Admittedly, my articles use Sanskrit in Unicode and complex
layout formatting.  That's why I use XeTeX, of course.

For an example, see especially pp.211 onwards of my article here:

   - http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2772122/

In the end, PMC agreed that their tech could not handle my writings, so
they published my PDF and no XML.

It sounds as if arXiv is facing similar difficulties.  The best way forward
for arXiv and PMC is to identify authors who are knowledgeable about
advanced document processing (i.e., the members of this list!), and talk to
them in a cooperative spirit about complex documents, metadata, and
conversion issues.  This would be better than treating such authors as
"difficulties."

Best,

Dominik

On 23 September 2014 08:16, <mskala at ansuz.sooke.bc.ca> wrote:

> On Tue, 23 Sep 2014, Ross Moore wrote:
> > It is the insistence on being able to reproduce the PDF
> > *automatically from source* that is where the problem lies.
>
> >From reading Norbert's Web blog, it appears that that's also an issue for
> Debian packaging of TeX-related software.  Debian has a formal requirement
> for everything that can possibly be built from source, to be built from
> source, and it's not practical to do that automatically with many
> TeX-related documentation files.  My own horoscop LaTeX package, whose
> documentation requires many megabytes of astrological software (free, but
> not typically packaged by Linux distributions) to compile properly, is
> only one example.  I think there are other packages that exist
> specifically to support expensive commercial products and require those
> products in order to compile, notwithstanding that the results of
> compilation are free to distribute.  This kind of thing is definitely a
> problem; I'm not sure it is TeX's problem.
>
> As for arXiv, what bothers me is that in the case of XeLaTeX, they will
> accept neither the source code *nor* the compiled PDF.  All an author can
> do is circumvent the rules by lying in the document metadata, or else go
> through contortions to compile a special arXiv-only version with some
> other software.  I found this page helpful in my efforts to do that:
>    http://member.ipmu.jp/yuji.tachikawa/cjk-on-arxiv/
>
> --
> Matthew Skala
> mskala at ansuz.sooke.bc.ca                 People before principles.
> http://ansuz.sooke.bc.ca/
>
>
> --------------------------------------------------
> Subscriptions, Archive, and List information, etc.:
>   http://tug.org/mailman/listinfo/xetex
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://tug.org/pipermail/xetex/attachments/20140923/6634fd29/attachment-0001.html>