[pdftex] pdftex compression -- proposed addition to manual

Ben Crowell crowell01 at lightandmatter.com
Tue Aug 14 14:45:47 CEST 2001


I'd like to suggest adding the following to the pdftex manual.
Those of you who read comp.text.tex will notice that in a moment
of brain-deadness, I posted it there earleir, when really I meant to
post it here. Oops!

In addition to the information I wrote up below, I'm curious
whether anyone can shed any light on the following:
- AFAICT, there is no native support for PNG images in PDF format.
	What does pdftex do with PNG images? Decompress them?
	Convert them to JPEG? If it converts them to JPEG, what
	compression does it use?
- Does pdftex have any support for CCITT or JBIG2?

		Ben Crowell

---------------------------------------------------------------------------
	proposed addition to manual
---------------------------------------------------------------------------

If you want to make PDF files with their compression tuned up
perfectly for your purposes, then you'll need to understand some of
the technical details about the PDF format below. If you want
to skip the complexities and just produce reasonably well compressed
output files, there are two main things you should know.
First, you should check that compress_level is set to 9.
Second, you should prepare bitmapped graphics input files
in a compressed format --- typically JPEG --- that makes an appropriate
tradeoff between compression and image quality; pdftex retains
the compression of the input image, but doesn't do much further
compression.

PDF format has some generic lossless compression capabilities.
Old versions of the format only allowed the LZW compression
algorithm, which is patent-encumbered. Newer versions also allow
the use of the Flate algorithm. Because of the patent issues,
pdftex only supports Flate. If your compress_level is set
appropriately, pdftex will use Flate compression. Flate
compression does a good job of compressing text and
line art.

For bitmapped images, however, Flate compression isn't enough
to produce good compression. If your input images are
uncompressed, Flate will compress them somewhat, but not
as much as a lossless compression algorithm designed for
images. If your input images are in a compressed format
such as JPEG, Flate
compression does not produce very much improvement.
PDF format therefore allows the use of several
different compression schemes for images: JPEG,
CCITT, and JBIG2. CCITT and JBIG2 are meant for black
and white text. JPEG is a more general-purpose
lossy-compression format for greyscale and color images.
If you use a JPEG file as an input, pdftex simply copies
it to the output, without changing its resolution or
applying any further compression. (Flate compression will
be applied if you've set compress_level appropriately,
but it has very little effect.)

A typical method of working with compressed images would
be to maintain all your original images in a lossless
format such as PNG, and produce JPEG versions as inputs
to pdftex. You can tune up the resolution and compression
level of the JPEG versions to achieve the desired tradeoff
between compression and image quality in your output file.




More information about the pdftex mailing list