[pdftex] pdftex compression -- proposed addition to manual

Wed Aug 29 15:37:39 CEST 2001

I wrote:

  > On a different topic relating to compression, Otfried Cheong
  > e-mailed me off-list, saying that when pdftex reads a PNG
  > or TIFF image as an input, it unpacks it into an uncompressed
  > pixel array, and then applies no further compression except for
  > Flate. This would seem to conflict with the information
  > George N. White III gave in a previous post, or maybe their
  > statements are really not contradictory and there's something I
  > don't understand. (Otfried, I apologize if I'm misrepresenting
  > what you said, but when I tried to reply to your e-mail off
  > list and ask you to post it to the list, my e-mail bounced back.)

Otfried Cheong sent me the following off-list, and gave permission
for me to post it:

-----------

Actually my reply was directed both to you and the list, but
apparently I'm not allowed to post as I'm not currently subscribed to
the list (I read it through the web interface every now and then -
without the digest option that's the only way for me).

Feel free to forward this to the list or not. :-)

I think George N. White III was just pointing out that a PDF file can
store images without using the specific image compression techniques
mentioned (JPEG, CCITT, etc.).

In fact, the easiest way of storing an image in PDF format is simply
as a stream of pixel values.  For an n by m image you'd have n*m
values, each specifing a color value in a certain way (there are
several "color models" such as RGB, grayscale, or color maps).
This stream can optionally be compressed using Flate.

This embedding model is what pdftex uses for PNG or TIFF images.  The
images are read using a standard PNG or TIFF library, resulting in a
large array of pixel values, which are then written out to the PDF
file, applying Flate compression as determined by compression_level.

I have this information from looking at the pdftex sources, so I'm
quite confident about this statement.

The only ways of creating embedded images in a different format using
pdftex are embedding JPEG (where pdftex simply copies the data stream
from the JPEG file literally into the PDF output, adding the
appropriate wrapper), and embedding pages from an external PDF
document.  The last approach is the most flexible - it allows you to
apply any kind of compression or PDF-supported format, and keeps
pdftex from trying to become a one-stop image manipulation package.

Hope this helps,
   Otfried