[pdftex] Blank lines are ignored when copying-and-pasting from PDF

James Quirk jjq at galcit.caltech.edu
Mon Jan 4 01:34:41 CET 2010


Martijn, Ross,

> > >Interesting workaround (attaching the key). The only problem I have with
> > >PDF
> > >attachments is that I want the PDF to work with all PDF readers. The PDF is
> > >an
> > >installation manual for my open source project and I want it to work even
> > >when
> > >the PDF reader does not support attachments.
There is an easy way around this problem which I discuss below.

> > >
> > >I tried to insert spaces etc. but it seems that all spaces are removed. I
> > >have
> > >search the web quite intensively for a solution but haven't found one. You
> > >would think that this is problematic for others as well, for example when
> > >including source code snippets. I have tried the listings package (which is
> > >used for including source code) but it suffered from the same problem.
> >Another approach would be to use PDF's /ActualTex facility.
> 
> Yes, this is a good way to go ...
Personally I would only use the /ActualText facility for small code 
fragments. Anything above half a dozen lines I would save as a file 
attachment, and to overcome Martijn's objection I would add a custom 
decoder to the PDF so as to allow the embedded files to be extracted, even 
when the end reader's PDF viewer cannot cope with attachments.

Consider, for example:

http://www.amrita-cfd.org/tex-group/self-unpacking.pdf

it contains LaTeX's listing package from my base TeX installation.

On my Linux box, the PDF can be viewed using:

   acroread (7.0.8,8.1.2,9.2)
   xpdf     (3.02+)
   evince   (2.2)

where the numbers in brackets are the versions I've tried.

Now if you want to unpack the files, you can run:

   perl -x self-unpacking.pdf -unpack 

or:

   ruby -x self-unpacking.pdf -unpack

depending on your fancy, and if you're security conscious
you can run:

   gpg --verify self-unpacking.pdf

to check that the PDF has not been tampered with. Of course
you still need to trust that I won't produce a malicious
document, but then again that's what digital signatures
are all about -- rings of trust; Ross knows me, but would
you trust Ross to vouch for me? And if Ross won't do,
would Scott Pakin? And so on.

Now if you examine self-unpacking.pdf in a text editor, you'll
see that it is wrapped in a gpg --clearsign signature.
This is possible as implementation note #13 of the spec
for PDF1.7 states that Acrobat viewers allow the %PDF 
header to be anywhere in the first 1024 Bytes of a file.
Similarly the %%EOF trailer can appear anywhere in the
last 1024 Bytes.

You'll also see that the Perl and Ruby decoders are stored 
in objects 37 and 38, respectively. Actually for expediency
the presnet Ruby decoder is a cheat, for it simply defers to the Perl
decoder. But a standalone Ruby decoder could easily be crafted.
It would also be a simple matter to craft one in either Lua or Python,
but neither of those languages support the Perl/Ruby -x option and so
they would need to be bootstrapped vi a URL.

Going back to the PDF, if you have JavaScript and /RichMedia annotations 
enabled you'll be able to click on a bookmark to launch an interactive 
source browser so as to view the attached files. Unfortunately, owing to 
an Adobe bug this facility is currently broken under OS X (sorry Ross), 
although I've been told it will be fixed in the next routine update, which 
is rumoured to be Jan 22nd or thereabouts.

Anyhow the purpose of the present example is to show that with a bit of 
lateral thinking, source-code and the like can be bound to a PDF in a way 
that eliminates the need for cut-and-pasting, which as Martijn found can 
all too often cause problems.

Lastly, the example's cryptic referenece to 2360 is easily explained when 
you consider 2360=2010+350 and 2010-350= 1660 the year The Royal Society 
was charted. At the risk of being flamed, I'm wondering if our 
descendants, 350 years hence, will still be cut-and-pasting and TeX-ing. 
For the love of computational science, I sincerely hope not!

James


More information about the pdftex mailing list