[pdftex] TeX as a composition server?

Reinhard Kotucha reinhard.kotucha at web.de
Sun Oct 24 00:03:37 CEST 2010

On 23 October 2010 John Culleton wrote:

 > On Saturday 23 October 2010 04:41:12 Peter Davis wrote:
 > > I'm looking at the possibility of using TeX as a composition
 > > server, something to compose blocks of text or pages in a high
 > > volume workflow. From what I've learned, TeX, and in particular
 > > pdfTeX, is capable of producing output that's very similar to
 > > InDesign composition, with suitable parameters.  So I have a few
 > > question perhaps this audience can help with.
 > >
 > > 1) Is there any way to gauge roughly what kind of throughput I
 > > could get? Could a single TeX process on a state-of-the-art Intel
 > > box, for example, produce hundreds of pages per minute?
 > > Thousands?  Tens or hundreds of thousands?  (I'm assuming A4 or
 > > letter pages of just text.)
 > A book with 402 pages, plenty of footnotes and two passes with
 > makeindex in between took about 11 seconds in a graphic window.
 > From a command line in a console session it took about 10
 > seconds. A single pass from the console session took 4 seconds.
 > From those numbers I estimate a rate of 100 pages a second or 6000
 > per minute.

Sure, printing to screen takes some time, hence I usually run
benchmark tests in \batchmode.  Maybe it's possible to suppress output
written to the log file too.  This seems to work:

  $ ln -s /dev/null texput.log
  $ pdftex \\relax hello \\bye

John, I suppose that you are using plain TeX.  Did you try to compile
The TeXbook (480 pages)?  It takes 0.66 seconds on a 2.5 GHz AMD CPU.
This makes abt. 700 pages per second.  Maybe in your case it's file I/O
which which slows down things.  Can you run your files from a RAM-disk?
On Gentoo Linux it's /dev/shm, maybe on Slackware too.

LaTeX seems to be significantly slower, and I don't think this is only
caused by file I/O.
 > Memory is not a critical issue. TeX reads its input files
 > sequentially and writes the output sequentially.  Some values are
 > stored internally during a run. The pdftex program requires
 > 1,356,692 bytes. pdflatex uses the same program, but acts
 > differently because it is called with a different name.

I suppose that even less memory is needed for the program itself
because modern operating systems don't load the whole program into
memory but only those memory pages which are actually needed.  On the
other hand, the format file and macro packages are loaded into memory
too.  Nevertheless, main memory is not critical indeed nowadays.  

Performance doesn't only depend on the amount of RAM and CPU speed.
Some time ago I encountered another issue.  I had to compile some huge
(7,000 pages) LaTeX files.  There was a lot of arithmetic involved
(pgfplots), hence the result isn't representative.  I noticed that on
a server with Xeon CPUs the files were processed abt. twice as fast as
on my machine, though the clock frequency was only 1.8 GHz (instead of
2.5 GHz here).  

The difference was the amount of CPU cache (512 kB here, 6 MB on the
server).  Sure, CPUs are much faster than RAM.  But maybe it doesn't
matter so much in general because usually file I/O is the bottle
neck.  In my case TeX spent most of the time doing things it wasn't
designed for.  But I think that it's desirable to keep things small in
order to get cached as much as possible.

 > > 2) Is it only pdfTeX which uses hz-program-like composition, with
 > > glyph scaling, etc.?  If so, is it possible to use pdfTeX to
 > > produce .dvi (or does .dvi prohibit the use of glyph scaling)?  I'd
 > > like to be able to generate bitmaps for JPEG/GIF/PNG output as well
 > > as PDF.
 > >
 > All versions of TeX can use the hz adjustment. But sometimes the 
 > instructions are hard to find.

AFAIK XeTeX doesn't support font expansion.  Thanh uploaded the
character protrusion code to the XeTeX repository a few months ago but
I'm not sure whether it's already used.  Currently only pdfTeX and
LuaTeX support HZ.  It's planned for XeTeX AFAIK, because so many
people asked for it, but Jonathan is too busy ATM.

I'm also not sure whether pdfTeX supports glyph scaling in DVI mode.
It's quite difficult.  Formerly font expansion was realized by
inserting one and the same Type 1 font more than once, each time with
a modified FontMatrix.  This obviously led to large files.  Another
drawback was that it worked with Type 1 fonts only.  TrueType fonts
don't have a FontMatrix.

Nowadys pdfTeX uses the TextMatrix for glyph scaling instead.  But the
TextMatrix is a PDF feature.  It works with any kind of fonts but it
can't be used in DVI mode.  This also means that implementing font
expansion in XeTeX is difficult.

 > > 3) Will pdfTeX work with all the standard font formats?
 > >
 > Most forms of TeX deal with a dedicated font library with many 
 > choices. Most of them are Type 1. Luatex and Lualatex will go to the 
 > standard system font libraries instead and can use TTF and OTF fonts, 
 > but at the moment not Type 1, which still need to be installed in the 
 > dedicated TeX font library. Font usage is a complicated thing in TeX 
 > but with effort any font can be used.

To be more precise: pdfTeX supports Type 1, TrueType and Type 3 fonts
only.  Type 3 fonts are usually bitmaps created by Metafont.  Though
TrueType fonts are self-contained, you have to create tfm files in
order to use them.  The reason pdfTeX can't derive the metrics from
the TTF file itself is that it supports 8-bit fonts only.

BTW, the term "system fonts" is a bit misleading.  It depends on what
you regard as a font.  If you regard a font as a collection of glyphs,
then every TeX engine can use system fonts.  All you have to do is to
set the variable OSFONTDIR in texmf.cnf appropriately.  When Jonathan
speaks about using system fonts, he means that the metrics can be
derived from the fonts themselves (TTF, OTF) or from the the AFM files
(Type 1), without the need to create TFM files.

XeTeX can process fonts directly (without TFM files) and I suppose
that LuaTeX supports it as well.  

A few thoughts about Peters problem:

If the HZ algorithm is a requirement, only pdfTeX and LuaTeX can be
used.  If other font formats than Type 1 and TTF are required, pdfTeX
can't be used.  LuaTeX makes all the internal data structures in TeX
and OpenType fonts accessible and thus is slower and sometimes
consumes a lot of memory.  The memory consumption can be problematic
if several files are created at the same time by several processes
forked by a web server.

Though XeTeX doesn't support glyph scaling yet, there is one
advantage, at least:  XeTeX itself creates an "eXtended DVI" file and
pipes it into xdvipdfmx in order to create PDF output.  This improves
performance if you have more than one CPU.


Reinhard Kotucha			              Phone: +49-511-3373112
Marschnerstr. 25
D-30167 Hannover	                      mailto:reinhard.kotucha at web.de
Microsoft isn't the answer. Microsoft is the question, and the answer is NO.

More information about the pdftex mailing list