[pdftex] pdftex compression -- proposed addition to manual
Ben Crowell
crowell01 at lightandmatter.com
Mon Aug 20 14:03:08 CEST 2001
Thanks, Siep Kroonenberg, for your comments! I took the liberty
of incorporating them more or less verbatim into the proposed
addition to the manual. Is this OK with you?
However, this just makes me wish more for an answer to the question
I previously posed about what pdflatex actually does with PNG
images, since, AFAICT from the Adobe docs, PNG can't be incorporated
in PDF without converting to some other format such as JPEG, CCITT,...
Since the people who maintain the manual haven't responded, I'm
wondering if this list is even the right place to discuss it. I couldn't
find their e-mails on the manual's webpage
(http://www.tug.org/applications/pdftex/) before, but now I notice that
their addresses are given inside the manual. Do they read this list, or
do I need to e-mail them?
---------------------------------------------------------------------------
proposed addition to manual
---------------------------------------------------------------------------
If you want to make PDF files with their compression tuned up
perfectly for your purposes, then you'll need to understand some of
the technical details about the PDF format below. If you want
to skip the complexities and just produce reasonably well compressed
output files, there are two main things you should know.
First, you should check that compress_level is set to 9.
Second, you should prepare bitmapped graphics input files
in a compressed format --- typically JPEG or PNG--- that makes an appropriate
tradeoff between compression and image quality; pdftex retains
the compression of the input image, but doesn't do much further
compression.
PDF format has some generic lossless compression capabilities.
Old versions of the format only allowed the LZW compression
algorithm, which is patent-encumbered. Newer versions also allow
the use of the Flate algorithm. Because of the patent issues,
pdftex only supports Flate. If your compress_level is set
appropriately, pdftex will use Flate compression. Flate
compression does a good job of compressing text and
line art.
For bitmapped images, however, Flate compression isn't enough
to produce good compression. If your input images are
uncompressed, Flate will compress them somewhat, but not
as much as a lossless compression algorithm designed for
images. If your input images are in a compressed format
such as JPEG, Flate
compression does not produce very much improvement.
PDF format therefore allows the use of several
different compression schemes for images: JPEG,
CCITT, and JBIG2. CCITT and JBIG2 are meant for black
and white text. JPEG is a more general-purpose
lossy-compression format for greyscale and color images,
but it is optimized for photographs.
If you use a JPEG file as an input, pdftex simply copies
it to the output, without changing its resolution or
applying any further compression. (Flate compression will
be applied if you've set compress_level appropriately,
but it has very little effect.)
A typical method of working with compressed images would
be to maintain all your original images in a lossless
format such as PNG, and produce JPEG versions as inputs
to pdftex. You can tune up the resolution and compression
level of the JPEG versions to achieve the desired tradeoff
between compression and image quality in your output file.
JPEG, however, is not always the best compressed format for
images. If the image consists of flat areas and discrete colors
(e.g. screenshots or diagrams) then lossless compressed formats such as PNG
are quite efficient, whereas JPEG compression would introduce artifacts.
The use of JPEG should be limited to photographic and comparable images,
for which it produces compression much better than PNG (typically by
about a factor of 4), without noticeably
affecting the visual quality of the images.
More information about the pdftex
mailing list