Web2c: A TeX implementation

Table of Contents


Next: , Up: (dir)

Web2c

This document describes how to install and use the programs in the Web2c implementation of the TeX system, especially for Unix systems. It corresponds to Web2c version 7.5.7, released in July 2008.


Next: , Previous: Top, Up: Top

1 Introduction

This manual corresponds to version 7.5.7 of Web2c, released in July 2008.

Web2c is the name of a TeX implementation, originally for Unix, but now also running under DOS, Amiga, and other operating systems. By TeX implementation, we mean all of the standard programs developed by the Stanford TeX project directed by Donald E. Knuth: Metafont, DVItype, GFtoDVI, BibTeX, Tangle, etc., as well as TeX itself. Other programs are also included: DVIcopy, written by Peter Breitenlohner, MetaPost and its utilities (derived from Metafont), by John Hobby, etc.

General strategy: Web2c works, as its name implies, by translating the WEB source in which TeX is written into C source code. Its output is not self-contained, however; it makes extensive use of many macros and functions in a library (the web2c/lib directory in the sources). Therefore, it will not work without change on an arbitrary WEB program.

Availability: All of Web2c is freely available—“free” both in the sense of no cost (free ice cream) and of having the source code to modify and/or redistribute (free speech). See unixtex.ftp, for the practical details of how to obtain Web2c.

Different parts of the Web2c distribution have different licensing terms, however, reflecting the different circumstances of their creation; consult each source file for exact details. The main practical implication for redistributors of Web2c is that the executables are covered by the GNU General Public License, and therefore anyone who gets a binary distribution must also get the sources, as explained by the terms of the GPL (see Copying). The GPL covers the Web2c executables, including tex, because the Free Software Foundation sponsored the initial development of the Kpathsea library that Web2c uses. The basic source files from Stanford, however, have their own copyright terms or are in the public domain, and are not covered by the GPL.

History: Tomas Rokicki originated the TeX-to-C system in 1987, working from the first change files for TeX under Unix, which were done primarily by Howard Trickey and Pavel Curtis. Tim Morgan then took over development and maintenance for a number of years; the name changed to Web-to-C somewhere in there. In 1990, Karl Berry became the maintainer. He made many changes to the original sources, and started using the shorter name Web2c. In 1997, Olaf Weber took over. Dozens of other people have contributed; their names are listed in the ChangeLog files.

Other acknowledgements: The University of Massachusetts at Boston (particularly Rick Martin and Bob Morris) has provided computers and ftp access to me for many years. Richard Stallman at the Free Software Foundation employed me while I wrote the original path searching library (for the GNU font utilities). (rms also gave us Emacs, GDB, and GCC, without which I cannot imagine developing Web2c.) And, of course, TeX would not exist in the first place without Donald E. Knuth.

Further reading: See References.


Next: , Previous: Introduction, Up: Top

2 Installation

(A copy of this chapter is in the distribution file web2c/INSTALL.)

Installing Web2c is mostly the same as installing any other Kpathsea-using program. Therefore, for the basic steps involved, see Installation. (A copy is in the file kpathsea/INSTALL.)

One peculiarity to Web2c is that the source distribution comes in two files: web.tar.gz and web2c.tar.gz. You must retrieve and unpack them both. (We have two because the former archive contains the very large and seldom-changing original WEB source files.) See unixtex.ftp.

Another peculiarity is the MetaPost program. Although it has been installed previously as mp, as of Web2c 7.0 the installed name is now mpost, to avoid conflict with the mp program that does prettyprinting. This approach was recommended by the MetaPost author, John Hobby. If you as the TeX administrator wish to make it available under its shorter name as well, you will have to set up a link or some such yourself. And of course individual users can do the same.

For solutions to common installation problems and information on how to report a bug, see the file kpathsea/BUGS (see Bugs). See also the Web2c home page, http://www.tug.org/web2c.

Points worth repeating:


Next: , Up: Installation

2.1 configure options

This section gives pointers to descriptions of the ‘--with’ and ‘--enableconfigure arguments that Web2c accepts. Some are specific to Web2c, others are generic to all Kpathsea-using programs.

For a list of all the options configure accepts, run ‘configure --help’. The generic options are listed first, and the package-specific options come last.

For a description of the generic options (which mainly allow you to specify installation directories) and basic configure usage, see Running configure scripts, a copy is in the file kpathsea/CONFIGURE.

--disable-dump-share
Do not make fmt/base/mem files sharable across different endian architectures. See Hardware and memory dumps.
--without-maketexmf-default
--without-maketexpk-default
--without-maketextfm-default
--with-maketextex-default
Enable or disable the dynamic generation programs. See mktex configuration. The defaults are the inverse of the options, i.e., everything is enabled except mktextex.
--enable-shared
Build Kpathsea as a shared library. See Shared library.
--with-editor=cmd
Change the default editor invoked by the ‘e’ interactive command. See Editor invocation.
--with-epsfwin
--with-hp2627win
--with-mftalkwin
--with-nextwin
--with-regiswin
--with-suntoolswin
--with-tektronixwin
--with-unitermwin
--with-x
--with-x-toolkit=KIT
--with-x11win
--with-x11
Define Metafont graphics support; by default, no graphics support is enabled. See Online Metafont graphics.
--x-includes=dir
--x-libraries=dir
Define the locations of the X11 include files and libraries; by default, configure does its best to guess). See Optional Features. A copy is in kpathsea/CONFIGURE.


Next: , Previous: configure options, Up: Installation

2.2 Compile-time options

In addition to the configure options listed in the previous section, there are a few things that can be affected at compile-time with C definitions, rather than with configure. Using any of these is unusual.

To specify extra compiler flags (‘-Dname’ in this case), the simplest thing to do is:

     make XCFLAGS="ccoptions"

You can also set the CFLAGS environment variable before running configure. See configure environment.

Anyway, here are the possibilities:

-DFIXPT
-DNO_MF_ASM
Use the original WEB fixed-point routines for Metafont and MetaPost arithmetic calculations regarding fractions. By default, assembly-language routines are used on x86 hardware with GNU C (unless ‘NO_MF_ASM’ is defined), and floating-point routines are used otherwise.
-DIPC_DEBUG
Report on various interprocess communication activities. See IPC and TeX.


Next: , Previous: Compile-time options, Up: Installation

2.3 Additional targets

Web2c has several Make targets besides the standard ones. You can invoke these either in the top level directory of the source distribution (the one containing kpathsea/ and web2c/), or in the web2c/ directory.

c-sources
Make only the C files, translated from the Web sources, presumably because you want to take them to a non-Unix machine.
formats
install-formats
Make or install all the memory dumps (see Memory dumps). By default, the standard plain formats plus latex.fmt are made. You can add other formats by redefining the fmts, bases, and mems variables. See the top of web2c/Makefile for the possibilities.
fmts
install-fmts
Make or install the TeX .fmt files. See Initial TeX.
bases
install-bases
Make or install the Metafont .base files. See Initial Metafont.
mems
install-mems
Make or install the MetaPost .mem files. See Initial MetaPost.
triptrap
trip
trap
mptrap
To run the torture tests for TeX, Metafont, and MetaPost (respectively). See the next section.


Next: , Previous: Additional targets, Up: Installation

2.4 Trip, trap, and mptrap: Torture tests

To validate your TeX, Metafont, and MetaPost executables, run ‘make triptrap’. This runs the trip, trap, and mptrap “torture tests”. See the files triptrap/tripman.tex, triptrap/trapman.tex, and triptrap/mptrap.readme for detailed information and background on the tests.

The differences between your executables' behavior and the standard values will show up on your terminal. The usual differences (these are all acceptable) are:

Any other differences are trouble. The most common culprit in the past has been compiler bugs, especially when optimizing. See TeX or Metafont failing.

The files trip.diffs, mftrap.diffs, and mptrap.diffs in the triptrap directory show the standard diffs against the original output. If you diff your diffs against these files, you should come up clean. For example

     make trip >&mytrip.diffs
     diff triptrap/trip.diffs mytrip.diffs

To run the tests separately, use the targets trip, trap, and mptrap.

To run simple tests for all the programs as well as the torture tests, run ‘make check’. You can compare the output to the distributed file tests/check.log if you like.


Previous: Triptrap, Up: Installation

2.5 Runtime options

Besides the configure- and compile-time options described in the previous sections, you can control a number of parameters (in particular, array sizes) in the texmf.cnf runtime file read by Kpathsea (see Config files).

Rather than exhaustively listing them here, please see the last section of the distributed kpathsea/texmf.cnf. Some of the more interesting values:

main_memory
Total words of memory available, for TeX, Metafont, and MetaPost. Must remake the format file after changing.
extra_mem_bot
Extra space for “large” TeX data structures: boxes, glue, breakpoints, et al. If you use PiCTeX, you may well want to set this.
font_mem_size
Words of font info available for TeX; this is approximately the total size of all TFM files read.
hash_extra
Additional space for the hash table of control sequence names. Approximately 10,000 control sequences can be stored in the main hash table; if you have a large book with numerous cross-references, this might not be enough, and thus you will want to set hash_extra.

Of course, ideally all arrays would be dynamically expanded as necessary, so the only limiting factor would be the amount of swap space available. Unfortunately, implementing this is extremely difficult, as the fixed size of arrays is assumed in many places throughout the source code. These runtime limits are a practical compromise between the compile-time limits in previous versions, and truly dynamic arrays. (On the other hand, the Web2c BibTeX implementation does do dynamic reallocation of some arrays.)


Next: , Previous: Installation, Up: Top

3 Commonalities

Many aspects of the TeX system are the same among more than one program, so we describe all those pieces together, here.


Next: , Up: Commonalities

3.1 Option conventions

To provide a clean and consistent behavior, we chose to have all these programs use the GNU function getopt_long_only to parse command lines. However, we do use in a restricted mode, where all the options have to come before the rest of the arguments.

As a result, you can:

By convention, non-option arguments, if specified, generally define the name of an input file, as documented for each program.

If a particular option with a value is given more than once, it is the last value that counts.

For example, the following command line specifies the options ‘foo’, ‘bar’, and ‘verbose’; gives the value ‘baz’ to the ‘abc’ option, and the value ‘xyz’ to the ‘quux’ option; and specifies the filename -myfile-.

     -foo --bar -verb -abc=baz -quux karl --quux xyz -- -myfile-


Next: , Previous: Option conventions, Up: Commonalities

3.2 Common options

All of these programs accept the standard GNU ‘--help’ and ‘--version’ options, and several programs accept ‘--verbose’. Rather than writing identical descriptions for every program, they are described here.

--help
Print a usage message listing basic usage and all available options to standard output, then exit successfully.
--verbose
Print progress reports to standard output.
--version
Print the version number to standard output, then exit successfully.

TeX, Metafont, and MetaPost have a number of additional options in common:

-file-line-error
-no-file-line-error
Change (or do not change) the way error messages are printed. The alternate style looks like error messages from many compilers and is easier to parse for some editors that invoke TeX. This option used to be called ‘-file-line-error-style’.
-fmt=dumpname
-base=dumpname
-mem=dumpname
Use dumpname instead of the program name or a ‘%&’ line to determine the name of the memory dump file read (‘fmt’ for TeX, ‘base’ for Metafont, ‘mem’ for MetaPost). See Memory dumps. Also sets the program name to dumpname if no ‘-progname’ option was given.
-halt-on-error
Stop processing and exit when an error occurs, as opposed to the normal process of trying to recover and continue.
-ini
Enable the “initial” form of the program (see Initial and virgin). This is implicitly set if the program name is initex resp. inimf resp. inimpost, although these variants are no longer typically installed.
-interaction=string
Set the interaction mode from the command line. The string must be one of ‘batchmode’, ‘nonstopmode’, ‘scrollmode’, or ‘errorstopmode’.
-jobname=string
Set the job name to string, instead of deriving it from the name of the input file.
-kpathsea-debug=number
Set path searching debugging flags according to the bits of number (see Debugging). You can also specify this in KPATHSEA_DEBUG environment variable (for all Web2c programs). (The command line value overrides.) The most useful value is ‘-1’, to get all available output.
-output-directory=dirname
Specify the directory dirname to which output files are written. Also look for input files in dirname first, before looking along the normal search path. This is useful when you are in some read-only distribution directory, perhaps on a CD-ROM, and want to TeX some documentation, for example. Note that for input files the “search” in dirname does not use the full generality of the search mechanism. This means that some files are not found there even though you might expect them to be.
-parse-first-line
-no-parse-first-line
Check or disable checking whether the first line of the main input file starts with ‘%&’, and parse it if it does. This line can be used specify the format and/or a TCX file.
-progname=string
Set program (and memory dump) name to string. This may affect the search paths and other values used (see Config files). Using this option is equivalent to making a link named string to the binary and then invoking the binary under that name. See Memory dumps.
-recorder
Enable the filename recorder. This makes the program save a list of the opened files into a file with (by default) extension ‘.fls’. For Omega, this option is always on, and the file has extension ‘.ofl’.
-translate-file=tcxfile
Use tcxfile to define which characters are printable and translations between the internal and external character sets. Moreover, tcxfile can be explicitly declared in the first line of the main input file ‘%& -translate-file=tcxfile’. This is the recommended method for portability reasons. See TCX files.
-8bit
This option specifies that by default all characters should be considered printable. If ‘-translate-file’ was given as well, then the TCX file may mark characters as non-printable.
-oem
This option is specific to Windows. When specified, TeX engines will use the OEM code page rather than the ANSI one to display their messages.


Next: , Previous: Common options, Up: Commonalities

3.3 Path searching

All of the Web2c programs, including TeX, which do path searching use the Kpathsea routines to do so. The precise names of the environment and configuration file variables which get searched for particular file formatted are therefore documented in the Kpathsea manual (see Supported file formats). Reading texmf.cnf (see Config files), invoking mktex... scripts (see mktex scripts), and so on are all handled by Kpathsea.

The programs which read fonts make use of another Kpathsea feature: texfonts.map, which allows arbitrary aliases for the actual names of font files; for example, ‘Times-Roman’ for ‘ptmr8r.tfm’. The distributed (and installed by default) texfonts.map includes aliases for many widely available PostScript fonts by their PostScript names.


Next: , Previous: Path searching, Up: Commonalities

3.4 Output file location

All the programs generally follow the usual convention for output files. Namely, they are placed in the directory current when the program is run, regardless of any input file location; or, in a few cases, output is to standard output.

For example, if you run ‘tex /tmp/foo’, for example, the output will be in ./foo.dvi and ./foo.log, not /tmp/foo.dvi and /tmp/foo.log.

You can use the ‘-output-directory’ option to cause all output files that would normally be written in the current directory to be written in the specified directory instead. See Common options.

If the current directory is not writable, and ‘-output-directory’ is not specified, the main programs (TeX, Metafont, MetaPost, and BibTeX) make an exception: if the config file value TEXMFOUTPUT is set (it is not by default), output files are written to the directory specified.


Previous: Output file location, Up: Commonalities

3.5 Three programs: Metafont, MetaPost, and TeX

TeX, Metafont, and MetaPost have a number of features in common. Besides the ones here, the common command-line options are described in the previous section. The configuration file options that let you control some array sizes and other features are described in Runtime options.


Next: , Up: Three programs

3.5.1 Initial and virgin

The TeX, Metafont, and MetaPost programs each have two main variants, called initial and virgin. As of Web2c 7, one executable suffices for both variants, and in fact, the ini... executables are no longer created.

The initial form is enabled if:

  1. the ‘-ini’ option was specified; or
  2. the program name is initex resp. inimf resp. inimpost (these variants are no longer typically installed); or
  3. the first line of the main input file is ‘%&ini’;
otherwise, the virgin form is used.

The virgin form is the one generally invoked for production use. The first thing it does is read a memory dump (see Determining the memory dump to use), and then proceeds on with the main job.

The initial form is generally used only to create memory dumps (see the next section). It starts up more slowly than the virgin form, because it must do lengthy initializations that are encapsulated in the memory dump file.


Next: , Previous: Initial and virgin, Up: Three programs

3.5.2 Memory dumps

In typical use, TeX, Metafont, and MetaPost require a large number of macros to be predefined; therefore, they support memory dump files, which can be read much more efficiently than ordinary source code.


Next: , Up: Memory dumps
3.5.2.1 Creating memory dumps

The programs all create memory dumps in slightly idiosyncratic (thought substantially similar) way, so we describe the details in separate sections (references below). The basic idea is to run the initial version of the program (see Initial and virgin), read the source file to define the macros, and then execute the \dump primitive.

Also, each program uses a different filename extension for its memory dumps, since although they are completely analogous they are not interchangeable (TeX cannot read a Metafont memory dump, for example).

Here is a list of filename extensions with references to examples of creating memory dumps:

TeX
(‘.fmt’) See Initial TeX.
Metafont
(‘.base’) See Initial Metafont.
MetaPost
(‘.mem’) See Initial MetaPost.

When making memory dumps, the programs read environment variables and configuration files for path searching and other values as usual. If you are making a new installation and have environment variables pointing to an old one, for example, you will probably run into difficulties.


Next: , Previous: Creating memory dumps, Up: Memory dumps
3.5.2.2 Determining the memory dump to use

The virgin form (see Initial and virgin) of each program always reads a memory dump before processing normal source input. All three programs determine the memory dump to use in the same way:

  1. If the first non-option command-line argument begins with ‘&’, the program uses the remainder of that argument as the memory dump name. For example, running ‘tex \&super’ reads super.fmt. (The backslash protects the ‘&’ against interpretation by the shell.)
  2. If the ‘-fmt’ resp. ‘-base’ resp. ‘-mem’ option is specified, its value is used.
  3. If the ‘-progname’ option is specified, its value is used.
  4. If the first line of the main input file (which must be specified on the command line, not in response to ‘**’) is %&dump, and dump is an existing memory dump of the appropriate type, dump is used.

    The first line of the main input file can also specify which character translation file is to be used: %&-translate-file=tcxfile (see TCX files).

    These two roles can be combined: %&dump -translate-file=tcxfile. If this is done, the name of the dump must be given first.

  5. Otherwise, the program uses the program invocation name, most commonly tex resp. mf resp. mpost. For example, if latex is a link to tex, and the user runs ‘latex foo’, latex.fmt will be used.


Previous: Determining the memory dump to use, Up: Memory dumps
3.5.2.3 Hardware and memory dumps

By default, memory dump files are generally sharable between architectures of different types; specifically, on machines of different endianness (see Byte order). (This is a feature of the Web2c implementation, and is not true of all TeX implementations.) If you specify ‘--disable-dump-share’ to configure, however, memory dumps will be endian-dependent.

The reason to do this is speed. To achieve endian-independence, the reading of memory dumps on LittleEndian architectures, such as PC's and DEC architectures, is somewhat slowed (all the multibyte values have to be swapped). Usually, this is not noticeable, and the advantage of being able to share memory dumps across all platforms at a site far outweighs the speed loss. But if you're installing Web2c for use on LittleEndian machines only, perhaps on a PC being used only by you, you may wish to get maximum speed.

TeXnically, even without ‘--disable-dump-share’, sharing of .fmt files cannot be guaranteed to work. Floating-point values are always written in native format, and hence will generally not be readable across platforms. Fortunately, TeX uses floating point only to represent glue ratios, and all common formats (plain, LaTeX, AMSTeX, ...) do not do any glue setting at .fmt-creation time. Metafont and MetaPost do not use floating point in any dumped value at all.

Incidentally, different memory dump files will never compare equal byte-for-byte, because the program always dumps the current date and time. So don't be alarmed by just a few bytes difference.

If you don't know what endianness your machine is, and you're curious, here is a little C program to tell you. (The configure script contains a similar program.) This is from the book C: A Reference Manual, by Samuel P. Harbison and Guy L. Steele Jr. (see References).

     main ()
     {
       /* Are we little or big endian?  From Harbison&Steele.  */
       union
       {
         long l;
         char c[sizeof (long)];
       } u;
       u.l = 1;
       if (u.c[0] == 1)
         printf ("LittleEndian\n");
       else if (u.c[sizeof (long) - 1] == 1)
         printf ("BigEndian\n");
       else
         printf ("unknownEndian");
     
       exit (u.c[sizeof (long) - 1] == 1);
     }


Next: , Previous: Memory dumps, Up: Three programs

3.5.3 Editor invocation

TeX, Metafont, and MetaPost all (by default) stop and ask for user intervention at an error. If the user responds with e or E, the program invokes an editor.

Specifying ‘--with-editor=cmd’ to configure sets the default editor command string to cmd. The environment variables/configuration values TEXEDIT, MFEDIT, and MPEDIT (respectively) override this. If ‘--with-editor’ is not specified, the default is vi +%d %s.

In this string, ‘%d’ is replaced by the line number of the error, and ‘%s’ is replaced by the name of the current input file.


Previous: Editor invocation, Up: Three programs

3.5.4 \input filenames

TeX, Metafont, and MetaPost source programs can all read other source files with the \input (TeX) and input (MF and MP) primitives:

     \input name % in TeX

The file name can always be terminated with whitespace; for Metafont and MetaPost, the statement terminator ‘;’ also works. (LaTeX and other macro packages provide other interfaces to \input that allow different notation; here we are concerned only with the primitive operation.)

As of Web2c version 7.5.3, double-quote characters can be used to include spaces or other special cases. In typical use, the ‘"’ characters surround the entire filename:

     \input "filename with spaces"

Technically, the quote characters can be used inside the name, and can enclose any characters, as in:

     \input filename" "with" "spaces

One more point. In LaTeX, the quotes are needed inside the braces, thus

     \input{a b}    % fails
     \input{"a b"}  % ok

This quoting mechanism comes into play after TeX has tokenized and expanded the input. So, multiple spaces and tabs may be seen as a single space, active characters such as ‘~’ are expanded first, and so on. (See below.)

On the other hand, various C library routines and Unix itself use the null byte (character code zero, ASCII NUL) to terminate strings. So filenames in Web2c cannot contain nulls, even though TeX itself does not treat NUL specially. In addition, some older Unix variants do not allow eight-bit characters (codes 128–255) in filenames.

For maximal portability of your document across systems, use only the characters ‘a’–‘z’, ‘0’–‘9’, and ‘.’, and restrict your filenames to at most eight characters (not including the extension), and at most a three-character extension. Do not use anything but simple filenames, since directory separators vary among systems; instead, add the necessary directories to the appropriate search path.

Finally, the present Web2c implementation does ‘~’ and ‘$’ expansion on name, unlike Knuth's original implementation and older versions of Web2c. Thus:

     \input ~jsmith/$foo.bar

will dereference the environment variable or Kpathsea config file value ‘foo’ and read that file extended with ‘.bar’ in user ‘jsmith’'s home directory. You can also use braces, as in ‘${foo}bar’, if you want to follow the variable name with a letter, numeral, or ‘_’.

(So another way to get a program to read a filename containing whitespace is to define an environment variable and dereference it.)

In all the common TeX formats (plain TeX, LaTeX, AMSTeX), the characters ‘~’ and ‘$’ have special category codes, so to actually use these in a document you have to change their catcodes or use \string. (The result is unportable anyway, see the suggestions above.) The place where they are most likely to be useful is when typing interactively.


Next: , Previous: Commonalities, Up: Top

4 TeX: Typesetting

TeX is a typesetting system: it was especially designed to handle complex mathematics, as well as most ordinary text typesetting.

TeX is a batch language, like C or Pascal, and not an interactive “word processor”: you compile a TeX input file into a corresponding device-independent (DVI) file (and then translate the DVI file to the commands for a particular output device). This approach has both considerable disadvantages and considerable advantages. For a complete description of the TeX language, see The TeXbook (see References). Many other books on TeX, introductory and otherwise, are available.


Next: , Up: TeX

4.1 tex invocation

TeX (usually invoked as tex) formats the given text and commands, and outputs a corresponding device-independent representation of the typeset document. This section merely describes the options available in the Web2c implementation. For a complete description of the TeX typesetting language, see The TeXbook (see References).

TeX, Metafont, and MetaPost process the command line (described here) and determine their memory dump (fmt) file in the same way (see Memory dumps). Synopses:

     tex [option]... [texname[.tex]] [tex-commands]
     tex [option]... \first-line
     tex [option]... &fmt args

TeX searches the usual places for the main input file texname (see Supported file formats), extending texname with .tex if necessary. To see all the relevant paths, set the environment variable KPATHSEA_DEBUG to ‘-1’ before running the program.

After texname is read, TeX processes any remaining tex-commands on the command line as regular TeX input. Also, if the first non-option argument begins with a TeX escape character (usually \), TeX processes all non-option command-line arguments as a line of regular TeX input.

If no arguments or options are specified, TeX prompts for an input file name with ‘**’.

TeX writes the main DVI output to the file basetexname.dvi, where basetexname is the basename of texname, or ‘texput’ if no input file was specified. A DVI file is a device-independent binary representation of your TeX document. The idea is that after running TeX, you translate the DVI file using a separate program to the commands for a particular output device, such as a PostScript printer (see Introduction) or an X Window System display (see xdvi(1)).

TeX also reads TFM files for any fonts you load in your document with the \font primitive. By default, it runs an external program named mktextfm to create any nonexistent TFM files. You can disable this at configure-time or runtime (see mktex configuration). This is enabled mostly for the sake of the EC fonts, which can be generated at any size.

TeX can write output files, via the \openout primitive; this opens a security hole vulnerable to Trojan horse attack: an unwitting user could run a TeX program that overwrites, say, ~/.rhosts. (MetaPost has a write primitive with similar implications). To alleviate this, there is a configuration variable openout_any, which selects one of three levels of security. When it is set to ‘a’ (for “any”), no restrictions are imposed. When it is set to ‘r’ (for “restricted”), filenames beginning with ‘.’ are disallowed (except .tex because LaTeX needs it). When it is set to ‘p’ (for “paranoid”) additional restrictions are imposed: an absolute filename must refer to a file in (a subdirectory) of TEXMFOUTPUT, and any attempt to go up a directory level is forbidden (that is, paths may not contain a ‘..’ component). The paranoid setting is the default. (For backwards compatibility, ‘y’ and ‘1’ are synonyms of ‘a’, while ‘n’ and ‘0’ are synonyms for ‘r’.)

In any case, all \openout filenames are recorded in the log file, except those opened on the first line of input, which is processed when the log file has not yet been opened. (If you as a TeX administrator wish to implement more stringent rules on \openout, modifying the function openoutnameok in web2c/lib/texmfmp.c is intended to suffice.)

The program accepts the following options, as well as the standard ‘-help’ and ‘-version’ (see Common options):

-enc
-[no]-file-line-error
-fmt=fmtname
-halt-on-error
-ini
-interaction=string
-ipc
-ipc-start
-jobname=string
-kpathsea-debug=number
-[no]parse-first-line
-output-directory
-progname=string
-recorder
-translate-file=tcxfile
-8bit
These options are common to TeX, Metafont, and MetaPost. See Common options.
-enc
Enable encTeX extensions, such as \mubyte. This can be used to support Unicode UTF-8 input encoding. See http://www.olsak.net/enctex.html.
-ipc
-ipc-start
With either option, TeX writes its DVI output to a socket as well as to the usual .dvi file. With ‘-ipc-start’, TeX also opens a server program at the other end to read the output. See IPC and TeX.

These options are available only if the ‘--enable-ipc’ option was specified to configure during installation of Web2c.

-mktex=filetype
-no-mktex=filetype
Turn on or off the ‘mktex’ script associated with filetype. The only values that make sense for filetype are ‘tex’ and ‘tfm’.
-mltex
If we are INITEX (see Initial and virgin), enable MLTeX extensions such as \charsubdef. Implicitly set if the program name is mltex. See MLTeX.
-output-comment=string
Use string as the DVI file comment. Ordinarily, this comment records the date and time of the TeX run, but if you are doing regression testing, you may not want the DVI file to have this spurious difference. This is also taken from the environment variable and config file value ‘output_comment’.
-shell-escape
-no-shell-escape
Enable (or disable) the ‘\write18{shell-command}’ feature. This is also enabled if the environment variable or config file value ‘shell_escape’ is set to ‘t’ (but the ‘-no-shell-escape’ command line option overrides this). (For backwards compatibility, ‘y’ and ‘1’ are accepted as synonyms of ‘t’). It is disabled by default to avoid security problems. When enabled, the shell-command string (which first undergoes the usual TeX expansions, just as in ‘\special’) is passed to the command shell (via the C library function ‘system’). The output of shell-command is not diverted anywhere, so it will not appear in the log file. The system call either happens at ‘\output’ time or right away, according to the absence or presence of the ‘\immediate’ prefix, as usual for \write. (If you as a TeX administrator wish to implement more stringent rules on what can be executed, you will need to modify tex.ch.)
-src-specials
-src-specials=string
This option makes TeX output specific source information using ‘\special’ commands in the DVI file. These ‘\special’ track the current file name and line number.

Using the first form of this option, the ‘\special’ commands are inserted automatically.

In the second form of the option, string is a comma separated list of the following values: ‘cr’, ‘display’, ‘hbox’, ‘math’, ‘par’, ‘parend’, ‘vbox’. You can use this list to specify where you want TeX to output such commands. For example, ‘-src-specials=cr,math’ will output source information every line and every math formula.

These commands can be used with the appropriate DVI viewer and text editor to switch from the current position in the editor to the same position in the viewer and back from the viewer to the editor.

This option works by inserting ‘\special’ commands into the token stream, and thus in principle these additional tokens can be recovered or seen by the tricky-enough macros. If you run across a case, let us know, because this counts as a bug. However, such bugs are very hard to fix, requiring significant changes to TeX, so please don't count on it.

Redefining ‘\special’ will not affect the functioning of this option. The commands inserted into the token stream are hard-coded to always use the ‘\special’ primitive.

TeX does not pass the trip test when this option is enabled.


Next: , Previous: tex invocation, Up: TeX

4.2 Initial TeX

The initial form of TeX is invoked by ‘tex -ini’. It does lengthy initializations avoided by the “virgin” (vir) form, so as to be capable of dumping ‘.fmt’ files (see Memory dumps). For a detailed comparison of virgin and initial forms, see Initial and virgin. In past releases, a separate program initex was installed to invoke the initial form, but this is no longer the case.

For a list of options and other information, see tex invocation.

Unlike Metafont and MetaPost, many format files are commonly used with TeX. The standard one implementing the features described in the TeXbook is ‘plain.fmt’, also known as ‘tex.fmt’ (again, see Memory dumps). It is created by default during installation, but you can also do so by hand if necessary (e.g., if an update to plain.tex is issued):

     tex -ini '\input plain \dump'

(The quotes prevent interpretation of the backslashes from the shell.) Then install the resulting plain.fmt in ‘$(fmtdir)’ (/usr/local/share/texmf/web2c by default), and link tex.fmt to it.

The necessary invocation for generating a format file differs for each format, so instructions that come with the format should explain. The top-level web2c Makefile has targets for making most common formats: plain latex amstex texinfo eplain. See Formats, for more details on TeX formats.


Next: , Previous: Initial TeX, Up: TeX

4.3 Formats

TeX formats are large collections of macros, often dumped into a .fmt file (see Memory dumps) by tex -ini (see Initial TeX). A number of formats are in reasonably widespread use, and the Web2c Makefile has targets to make the versions current at the time of release. You can change which formats are automatically built by setting the fmts Make variable; by default, only the ‘plain’ and ‘latex’ formats are made.

You can get the latest versions of most of these formats from the CTAN archives in subdirectories of CTAN:/macros (for CTAN info, see unixtex.ftp). The archive ftp://ftp.tug.org/tex/lib.tar.gz (also available from CTAN) contains most of these formats (although perhaps not the absolute latest version), among other things.

latex
The most widely used format. The current release is named `LaTeX 2e'; new versions are released approximately every six months, with patches issued as needed. The old release was called `LaTeX 2.09', and is no longer maintained or supported. LaTeX attempts to provide generic markup instructions, such as “emphasize”, instead of specific typesetting instructions, such as “use the 10pt Computer Modern italic font”. The LaTeX home page: http://www.latex-project.org.
context
ConTeXt is an independent macro package which has a basic document structuring approach similar to LaTeX. It also supports creating interactive PDF files and has integrated MetaPost support, among many other interesting features. The ConTeXt home page: http://www.pragma-ade.com.
amstex
The official typesetting system of the American Mathematical Society. Like LaTeX, it encourages generic markup commands. The AMS also provides many LaTeX package for authors who prefer LaTeX. Taken together, they are used to produce nearly all AMS publications, e.g., Mathematical Reviews. The AMSTeX home page: http://www.ams.org/tex.
texinfo
The documentation system developed and maintained by the Free Software Foundation for their software manuals. It can be automatically converted into plain text, a machine-readable on-line format called `info', HTML, etc. The Texinfo home page: http://www.gnu.org/software/texinfo.
eplain
The “expanded plain” format provides various common features (e.g., symbolic cross-referencing, tables of contents, indexing, citations using BibTeX), for those authors who prefer to handle their own high-level formatting. The Eplain home page: http://www.tug.org/eplain.
slitex
An obsolete LaTeX 2.09 format for making slides. It is replaced by the ‘slides’ document class, along with the ‘beamer’, ‘texpower’, and other packages.


Next: , Previous: Formats, Up: TeX

4.4 Languages and hyphenation

TeX supports most natural languages. See also TeX extensions.


Next: , Up: Languages and hyphenation

4.4.1 MLTeX: Multi-lingual TeX

Multi-lingual TeX (mltex) is an extension of TeX originally written by Michael Ferguson and now updated and maintained by Bernd Raichle. It allows the use of non-existing glyphs in a font by declaring glyph substitutions. These are restricted to substitutions of an accented character glyph, which need not be defined in the current font, by its appropriate \accent construction using a base and accent character glyph, which do have to exist in the current font. This substitution is automatically done behind the scenes, if necessary, and thus MLTeX additionally supports hyphenation of words containing an accented character glyph for fonts missing this glyph (e.g., Computer Modern). Standard TeX suppresses hyphenation in this case.

MLTeX works at .fmt-creation time: the basic idea is to specify the ‘-mltex’ option to TeX when you \dump a format. Then, when you subsequently invoke TeX and read that .fmt file, the MLTeX features described below will be enabled.

Generally, you use special macro files to create an MLTeX .fmt file.

The sections below describe the two new primitives that MLTeX defines. Aside from these, MLTeX is completely compatible with standard TeX.


Next: , Up: MLTeX
4.4.1.1 \charsubdef: Character substitutions

The most important primitive MLTeX adds is \charsubdef, used in a way reminiscent of \chardef:

     \charsubdef composite [=] accent base

Each of composite, accent, and base are font glyph numbers, expressed in the usual TeX syntax: `\e symbolically, '145 for octal, "65 for hex, 101 for decimal.

MLTeX's \charsubdef declares how to construct an accented character glyph (not necessarily existing in the current font) using two character glyphs (that do exist). Thus it defines whether a character glyph code, either typed as a single character or using the \char primitive, will be mapped to a font glyph or to an \accent glyph construction.

For example, if you assume glyph code 138 (decimal) for an e-circumflex and you are using the Computer Modern fonts, which have the circumflex accent in position 18 and lowercase `e' in the usual ASCII position 101 decimal, you would use \charsubdef as follows:

     \charsubdef 138 = 18 101

For the plain TeX format to make use of this substitution, you have to redefine the circumflex accent macro \^ in such a way that if its argument is character `e' the expansion \char138 is used instead of \accent18 e. Similar \charsubdef declaration and macro redefinitions have to be done for all other accented characters.

To disable a previous \charsubdef c, redefine c as a pair of zeros. For example:

     \charsubdef '321 = 0 0  % disable N tilde

(Octal '321 is the ISO Latin-1 value for the Spanish N tilde.)

\charsubdef commands should only be given once. Although in principle you can use \charsubdef at any time, the result is unspecified. If \charsubdef declarations are changed, usually either incorrect character dimensions will be used or MLTeX will output missing character warnings. (The substitution of a \charsubdef is used by TeX when appending the character node to the current horizontal list, to compute the width of a horizontal box when the box gets packed, and when building the \accent construction at \shipout-time. In summary, the substitution is accessed often, so changing it is not desirable, nor generally useful.)


Previous: \charsubdef, Up: MLTeX
4.4.1.2 \tracingcharsubdef: Substitution diagnostics

To help diagnose problems with ‘\charsubdef’, MLTeX provides a new primitive parameter, \tracingcharsubdef. If positive, every use of \charsubdef will be reported. This can help track down when a character is redefined.

In addition, if the TeX parameter \tracinglostchars is 100 or more, the character substitutions actually performed at \shipout-time will be recorded.


Previous: patgen invocation, Up: Languages and hyphenation

4.4.2 TCX files: Character translations

TCX (TeX character translation) files help TeX support direct input of 8-bit international characters if fonts containing those characters are being used. Specifically, they map an input (keyboard) character code to the internal TeX character code (a superset of ASCII).

Of the various proposals for handling more than one input encoding, TCX files were chosen because they follow Knuth's original ideas for the use of the ‘xchr’ and ‘xord’ tables. He ventured that these would be changed in the WEB source in order to adjust the actual version to a given environment. It turns out, however, that recompiling the WEB sources is not as simple a task as Knuth may have imagined; therefore, TCX files, providing the possibility of changing of the conversion tables on on-the-fly, have been implemented instead.

This approach limits the portability of TeX documents, as some implementations do not support it (or use a different method for input-internal reencoding). It may also be problematic to determine the encoding to use for a TeX document of unknown provenance; in the worst case, failure to do so correctly may result in subtle errors in the typeset output. But we feel the benefits outweigh these disadvantages.

This is entirely independent of the MLTeX extension (see MLTeX): whereas a TCX file defines how an input keyboard character is mapped to TeX's internal code, MLTeX defines substitutions for a non-existing character glyph in a font with a \accent construction made out of two separate character glyphs. TCX files involve no new primitives; it is not possible to specify that an input (keyboard) character maps to more than one character.

Information on specifying TCX files:

The Web2c distribution comes with a number of TCX files. Two important ones are il1-t1.tcx and il2-t1.tcx, which support ISO Latin 1 and ISO Latin 2, respectively, with Cork-encoded fonts (a.k.a. the LaTeX T1 encoding). TCX files for Czech, Polish, and Slovak are also provided.

One other notable TCX file is empty.tcx, which is, well, empty. Its purpose is to reset Web2C's behavior to the default (only visible ASCII being printable, as described below) when a format was dumped with another TCX being active—which is in fact the case for everything but plain TeX in the TeX Live and other distributions. Thus:

     latex somefile8.tex
     ⇒ terminal etc. output with 8-bit chars
     latex --translate-file=empty.tcx somefile8.tex
     ⇒ terminal etc. output with ^^ notation

Syntax of TCX files:

  1. Line-oriented. Blank lines are ignored.
  2. Whitespace is ignored except as a separator.
  3. Comments start with ‘%’ and continue to the end of the line.
  4. Otherwise, a line consists of one or two character codes, optionally followed by 0 or 1. The last number indicates whether dest is considered printable.
              src [dest [prnt]]
    
  5. Each character code may be specified in octal with a leading ‘0’, hexadecimal with a leading ‘0x’, or decimal otherwise. Values must be between 0 and 255, inclusive (decimal).
  6. If the dest code is not specified, it is taken to be the same as src.
  7. If the same src code is specified more than once, it is the last definition that counts.

Finally, here's what happens: when TeX sees an input character with code src, it 1) changes src to dest; and 2) makes the dest code “printable”, i.e., printed as-is in diagnostics and the log file rather than in ‘^^’ notation.

By default, no characters are translated, and character codes between 32 and 126 inclusive (decimal) are printable.

Specifying translations for the printable ASCII characters (codes 32–127) will yield unpredictable results. Additionally you shouldn't make the following characters printable: ^^I (TAB), ^^J (line feed), ^^M (carriage return), and ^^? (delete), since TeX uses them in various ways.

Thus, the idea is to specify the input (keyboard) character code for src, and the output (font) character code for dest.

By default, only the printable ASCII characters are considered printable by TeX. If you specify the ‘-8bit’ option, all characters are considered printable by default. If you specify both the ‘-8bit’ option and a TCX file, then the TCX can set specific characters to be non-printable.

Both the specified TCX encoding and whether characters are printable are saved in the dump files (like tex.fmt). So by giving these options in combination with ‘-ini’, you control the defaults seen by anyone who uses the resulting dump file.

When loading a dump, if the ‘-8bit’ option was given, then all characters become printable by default.

When loading a dump, if a TCX file was specified, then the TCX data from the dump is ignored and the data from the file used instead.


Next: , Previous: MLTeX, Up: Languages and hyphenation

4.4.3 Patgen: Creating hyphenation patterns

Patgen creates hyphenation patterns from dictionary files for use with TeX. Synopsis:

     patgen dictionary patterns output translate

Each argument is a filename. No path searching is done. The output is written to the file output.

In addition, Patgen prompts interactively for other values.

For more information, see Word hy-phen-a-tion by com-puter by Frank Liang (see References), and also the patgen.web source file.

The only options are ‘-help’ and ‘-version’ (see Common options).


Next: , Previous: Languages and hyphenation, Up: TeX

4.5 IPC and TeX

(Sorry, but I'm not going to write this unless someone actually uses this feature. Let me know.)

This functionality is available only if the ‘--enable-ipc’ option was specified to configure during installation of Web2c (see Installation).

If you define IPC_DEBUG before compilation (e.g., with ‘make XCFLAGS=-DIPC_DEBUG’), TeX will print messages to standard error about its socket operations. This may be helpful if you are, well, debugging.


Previous: IPC and TeX, Up: TeX

4.6 TeX extensions

The base TeX program has been extended in many ways. Here's a partial list. Please send information on extensions not listed here to the address in Reporting bugs.

e-TeX
Adds many new primitives, including right-to-left typesetting. Available from http://www.vms.rhbnc.ac.uk/e-TeX/ and CTAN:/systems/e-tex.
Omega
Adds Unicode support, right-to-left typesetting, and more. Available from http://www.ens.fr/omega and CTAN:/systems/omega.
pdfTeX
A variant of TeX that produces PDF instead of DVI files. It also includes primitives for hypertext and micro-typography. Available from CTAN:/systems/pdftex.
TeX--XeT
Adds primitives and DVI opcodes for right-to-left typesetting (as used in Arabic, for example). An old version for TeX 3.1415 is available from CTAN:/systems/knuth/tex--xet. A newer version is included in e-TeX.
File-handling TeX
Adds primitives for creating multiple DVI files in a single run; and appending to output files as well as overwriting. Web2c implementation available in the distribution file web2c/contrib/file-handling-tex.


Next: , Previous: TeX, Up: Top

5 Metafont: Creating typeface families

Metafont is a system for producing shapes; it was designed for producing complete typeface families, but it can also produce geometric designs, dingbats, etc. And it has considerable mathematical and equation-solving capabilities which can be useful entirely on their own.

Metafont is a batch language, like C or Pascal: you compile a Metafont program into a corresponding font, rather than interactively drawing lines or curves. This approach has both considerable disadvantages (people unfamiliar with conventional programming languages will be unlikely to find it usable) and considerable advantages (you can make your design intentions specific and parameterizable). For a complete description of the Metafont language, see The METAFONTbook (see References).


Next: , Up: Metafont

5.1 mf invocation

Metafont (usually invoked as mf) reads character definitions specified in the Metafont programming language, and outputs the corresponding font. This section merely describes the options available in the Web2c implementation. For a complete description of the Metafont language, see The Metafontbook (see References).

Metafont processes its command line and determines its memory dump (base) file in a way exactly analogous to MetaPost and TeX (see tex invocation, and see Memory dumps). Synopses:

     mf [option]... [mfname[.mf]] [mf-commands]
     mf [option]... \first-line
     mf [option]... &base args

Most commonly, a Metafont invocation looks like this:

     mf '\mode:=mode; mag:=magnification; input mfname'

(The single quotes avoid unwanted interpretation by the shell.)

Metafont searches the usual places for the main input file mfname (see Supported file formats), extending mfname with .mf if necessary. To see all the relevant paths, set the environment variable KPATHSEA_DEBUG to ‘-1’ before running the program. By default, Metafont runs an external program named mktexmf to create any nonexistent Metafont source files you input. You can disable this at configure-time or runtime (see mktex configuration). This is mostly for the sake of the EC fonts, which can be generated at any size.

Metafont writes the main GF output to the file basemfname.nnngf, where nnn is the font resolution in pixels per inch, and basemfname is the basename of mfname, or ‘mfput’ if no input file was specified. A GF file contains bitmaps of the actual character shapes. Usually GF files are converted immediately to PK files with GFtoPK (see gftopk invocation), since PK files contain equivalent information, but are more compact. (Metafont output in GF format rather than PK for only historical reasons.)

Metafont also usually writes a metric file in TFM format to basemfname.tfm. A TFM file contains character dimensions, kerns, and ligatures, and spacing parameters. TeX reads only this .tfm file, not the GF file.

The mode in the example command above is a name referring to a device definition (see Modes); for example, localfont or ljfour. These device definitions must generally be precompiled into the base file. If you leave this out, the default is proof mode, as stated in The Metafontbook, in which Metafont outputs at a resolution of 2602dpi; this is usually not what you want. The remedy is simply to assign a different mode—localfont, for example.

The magnification assignment in the example command above is a magnification factor; for example, if the device is 600dpi and you specify mag:=2, Metafont will produce output at 1200dpi. Very often, the magnification is an expression such as magstep(.5), corresponding to a TeX “magstep”, which are factors of

After running Metafont, you can use the font in a TeX document as usual. For example:

     \font\myfont = newfont
     \myfont Now I am typesetting in my new font (minimum hamburgers).

The program accepts the following options, as well as the standard ‘-help’ and ‘-version’ (see Common options):

-[no]-file-line-error
-fmt=fmtname
-halt-on-error
-ini
-interaction=string
-jobname=string
-kpathsea-debug=number
-[no]parse-first-line
-output-directory
-progname=string
-recorder
-translate-file=tcxfile
-8bit
These options are common to TeX, Metafont, and MetaPost. See Common options.
-mktex=filetype
-no-mktex=filetype
Turn on or off the ‘mktex’ script associated with filetype. The only value that makes sense for filetype is ‘mf’.


Next: , Previous: mf invocation, Up: Metafont

5.2 Initial Metafont

inimf is the “initial” form of Metafont, which does lengthy initializations avoided by the “virgin” (vir) form, so as to be capable of dumping ‘.base’ files (see Memory dumps). For a detailed comparison of virgin and initial forms, see Initial and virgin. In past releases, a separate program inimf was installed to invoke the initial form, but this is no longer the case.

For a list of options and other information, see mf invocation.

The only memory dump file commonly used with Metafont is the default ‘plain.base’, also known as ‘mf.base’ (again, see Memory dumps). It is created by default during installation, but you can also do so by hand if necessary (e.g., if a Metafont update is issued):

     mf -ini '\input plain; input modes; dump'

(The quotes prevent interpretation of the backslashes from the shell.) Then install the resulting plain.base in ‘$(basedir)’ (/usr/local/share/texmf/web2c by default), and link mf.base to it.

For an explanation of the additional modes.mf file, see Modes. This file has no counterpart in TeX or MetaPost.

In the past, it was sometimes useful to create a base file cmmf.base (a.k.a. cm.base), with the Computer Modern macros also included in the base file. Nowadays, however, the additional time required to read cmbase.mf is exceedingly small, usually not enough to be worth the administrative hassle of updating the cmmf.base file when you install a new version of modes.mf. People actually working on a typeface may still find it worthwhile to create their own base file, of course.


Next: , Previous: Initial Metafont, Up: Metafont

5.3 Modes: Device definitions for Metafont

Running Metafont and creating Metafont base files requires information that TeX and MetaPost do not: mode definitions which specify device characteristics, so Metafont can properly rasterize the shapes.

When making a base file, a file containing modes for locally-available devices should be input after plain.mf. One commonly used file is ftp://ftp.tug.org/tex/modes.mf; it includes all known definitions.

If, however, for some reason you have decreased the memory available in your Metafont, you may need to copy modes.mf and remove the definitions irrelevant to you (probably most of them) instead of using it directly. (Or, if you're a Metafont hacker, maybe you can suggest a way to redefine mode_def and/or mode_setup; right now, the amount of memory used is approximately four times the total length of the mode_def names, and that's a lot.)

If you have a device not included in mode