[luatex] CIDSet in PDF/A documents

Reinhard Kotucha reinhard.kotucha at gmx.de
Thu Jun 25 04:35:29 CEST 2015


On 2015-06-24 at 08:57:35 +0200, luigi scarso wrote:

 > Validating a pdf/a-1b is a quite complicate task, so in the real
 > life you want validate with the "best" validator --- and cross the
 > fingers.  Acrobat Pro is one of the best validators around so if it
 > says that it's ok, then you have strong reasons to say that it's
 > ok.

I doubt that Acrobat Pro is the best choice.  Its source code isn't
available, hence you don't know what it actually does.  Former
versions ignored most errors and I don't know whether the latest
version checks everything now.  The Isartor Test Suite can be used in
order to validate a validator but you certainly don't want to run all
these tests unless the validator has a command-line interface.  And a
big drawback is that Acrobat Pro isn't available on all platforms.

And people happily created zillions of invalid PDF/A-1b files because
they blindly relied on former versions of Acrobat Pro.

 > In my opinion, pdf/a is a great thing, but the lack of free &
 > solid pdf/a-1a validators still limits its adoption.

IMO Apache PDFbox is the most promising tool.  It's written in Java
and thus available on all platforms.  It's free software, hence the
sources are available.  AFAIK it supports PDF/A-1b only but if you
need a validator for PDF/A-1a, I think that it makes more sense to
extend PDFbox rather to start from scratch or to rely on proprietary
software.

Furthermore, the PDFbox PDF/A-1b validator is just an application
program which makes use of a small subset of a powerful PDF library.   
I'm not familiar with Java at all, but I'm convinced that with the
library is a good starting point for other projects.

When I worked on the PDF/A-1b stuff it turned out that a useful
validator has to fulfil at least these requirements:

  1. It has to work on all platforms.

  2. It must be possible to run it on the command-line or from a
     makefile. 

  3. It shouldn't abort after the first error whenever possible.

  4. The source code must be available.


Items 2,and 3 are obvious if you test your software while you're
writing it (which IMO saves a lot of time).  When I wrote the XMP file
I was aware that the time stamps in the PDF file and in the XMP header
were different but I can't do everything at the same time.  Validators
which aborted after the first error turned out to be less helpful.

Beside item 1, item 4 is the most important one.  I wouldn't have
started this discussion at all if the sources of the PDFtron validator
were available.

With that said Acrobat Pro is certainly one of the worst tools at all
and Apache PDFbox seems to be the most promising one because it's free
software with all the advantages free software offers.

I recommend TeX users who are familiar with Java programming to have a
look at

  https://pdfbox.apache.org/

Regards,
  Reinhard

-- 
------------------------------------------------------------------
Reinhard Kotucha                            Phone: +49-511-3373112
Marschnerstr. 25
D-30167 Hannover                    mailto:reinhard.kotucha at web.de
------------------------------------------------------------------


More information about the luatex mailing list