[pdftex] pdftosrc can't extract non-source files

Reinhard Kotucha reinhard.kotucha at web.de
Fri Jul 26 00:30:14 CEST 2013

On 2013-07-23 at 23:07:01 -0400, x wrote:

 > pdftosrc only extracts embedded files of type source, which seems
 > like an entirely arbitrary restriction.  This prevented it from
 > extracting the embedded files


 > okular can extract these files, manually, but isn't suitable for
 > batch operations.
 > Also, it would be helpful to show an index of the attached files.

Hi Mark,
admittedly I don't know the reason for this restriction either, I'm
not very familiar with the PDF specs.  However, your suggestion to
show an index of attached files is very specific, more often I have to
extract fonts and font related objects for debugging purposes.  In
order to make pdftosrc more versatile, I fear that an enourmous amount
of manpower is required. 

At EuroTeX 2012 Taco introduced the LuaTeX epdf library.  It allows
you to extract objects from PDF files.  Only a few functions are
implemeted by now, but I had been able to determine the number of
attached files in a PDF document at least.  I hope that Taco finds
some time to work on it in the near future.

I'm convinced that the epdf library is the best approach.  It's not
restricted to particular types of objects.  An index of attached files
isn't sufficient, each file is accompanied by metadata which has to be
made accessible too.  pdftosrc returns a string on the command line,
Lua returns data structures and thus is more powerful and flexible.


Reinhard Kotucha                                      Phone: +49-511-3373112
Marschnerstr. 25
D-30167 Hannover                              mailto:reinhard.kotucha at web.de
Microsoft isn't the answer. Microsoft is the question, and the answer is NO.

More information about the pdftex mailing list