[pdftex] Using truetype fonts with pdftex

Thu Nov 20 18:47:04 CET 2008

Hi,

the question how to use truetype fonts with pdftex keeps
popping up from time to time, so I wrote some (draft) notes
which would be added to the official pdftex manual.
Comments/suggestions welcome.

Regards,
Thanh

======================================================
\subsection{A closer look at TrueType fonts and pdftex}
The most common outline format for tex is t1. ttf fonts are slightly
different from t1 and hence requires some extra work to get it right.
An important issue when dealing with ttf is to understand the way how t1
and ttf handle encoding and glyph names (or more precisely, glyph
``identity'').

t1 is familiar with tex users, so let's start with it first: t1 refers to
glyphs by names, ie each glyph is identified by its name like
`/A', `/comma', etc. Given a glyph name, it's easy to tell if a t1 font
contains that glyph or not. Encoding with t1 is therefore simple: given a
number (between 0..255), an encoding tells us the name to get the relevant
glyph.

With ttf the situation is not that simple, since ttf doesn't use name to
refer to glyphs, but ``indices''. This means that each glyph
is identified by its index, not its name. The indices are numbers that
differ from font to font. ttf handle encodings by a mechanism called
``cmap'', which are roughly tables containing mapping from character code
to glyph index. A ttf font can contain one or more such tables (each
corresponds to an encoding).

Since glyph names are not strictly necessary for ttf, they are not always
available inside a ttf font. Given a ttf font, one of the following case
may happen:

\startitemize[a,packed]
\item the font contains correct names for all its glyphs. This is the ideal
    situation and is often the case for high-quality latin fonts.

\item the font contains wrong name for all or most of its glyphs. This is
    the worst situation that often happens with poor-quality fonts, or
    fonts converted from other formats.

\item the font contains no glyph names at all. Newer version of Palatino
    fonts by Linotype (v1.40, coming with Windows XP) is one example.

\item the font contains correct names for most glyphs, and no names or
    wrong names for a few glyphs. This happens from time to time.
\stopitemize

One may wonder how things can be so complex with glyph names in ttf. The
reason is that t1 fonts rely on correct names to work properly. If a glyph
has wrong name, it get noticed immediately. ttf as mentioned before don't
use names for its encoding. So, if glyph names in a ttf font are wrong or
missing, it's usually not a big deal and often goes unnoticed.

The potential problem with using ttf in pdftex is that we are so used to
the t1 encoding convention which relies on correct glyph names. Most font
tools also rely on this convention; all encoding file (.enc files) use
glyph names, too. But as discussed above, glyph names in ttf is not
something very reliable. If we encounter a font that doesn't have correct
names for all glyphs, we need to do some more work.

If glyph names are not correct, we need a better way to refer to a glyph in
ttf fonts than using its name. The most reliable way seems to be via
Unicode: most ttf fonts provide correct mapping from unicode to glyph
index. This is something we can count on, since it is required for a ttf to
be usable.

>From version 1.21a pdftex supports the naming convention `uniXXXX' in
encoding files. This only makes sense with ttf fonts, of course. When
pdftex sees for example `uni12AB', it will:

\startitemize[a,packed]
\item read the table <unicode> -> <glyph-index> from the font;
\item loop up the value '12AB' in the table; if found then pick the
    relevant glyph index.
\stopitemize

ttf2afm also does the same lookup when it sees names like `uni12AB'.

Now let's review the minimal steps to get a ttf font working with pdftex:
\startitemize[a,packed]
\item generate an afm from ttf using ttf2afm. Example:
    \starttyping
    ttf2afm -e 8r.enc -o times.afm times.ttf
    \stoptyping
\item convert afm to tfm using whatever tool suitable: afm2tfm, fontinst,
    afm2pl, etc. Example:
    \starttyping
    afm2tfm times.afm -T 8r.enc
    \stoptyping
\item create the needed map entry for the font. Example:
    \starttyping
    \pdfmapfile{+times TimesNewRomanPSMT <8r.enc <times.ttf}
    \stoptyping
\stopitemize

That was the easiest case when glyph names are correct. Now let's have a
font where we cannot rely on glyph names, for example Palatino by Linotype,
version 1.40. Let's assume we want to use T1 encoding with this font. So we
put pala.ttf and ec.enc in the current directory before processing further.

The first attempt would be:
\starttyping
ttf2afm -e ec.enc -o pala.afm pala.ttf
\stoptyping

However, since the names in ec.enc are not available in pala.ttf (in fact
there are no names inside the font), we would get a bunch of warnings:
\starttyping
Warning: ttf2afm (file pala.ttf): no names available in `post' table, print
glyph names as indices

Warning: ttf2afm (file pala.ttf): glyph `grave' not found
.
.
.
\stoptyping

and the output pala.afm will contain no names at all. Instead of glyph
names in ec.enc, we get weird things like `index123'. And all glyphs are
not encoded:
\starttyping
C -1 ; WX 832 ; N index10 ; B 24 -3 807 689 ;
.
.
.
\stoptyping

We try again, this time without giving encoding:
\starttyping
ttf2afm -o pala.afm pala.ttf
\stoptyping

This time since we didn't ask ttf2afm to re-encode the output afm, we get
less warning:
\starttyping
Warning: ttf2afm (file pala.ttf): no names available in `post' table, print
glyph names as indices
\stoptyping

and the afm output is the same as in the previous attempt. Which is not
very useful, since there is little we can do with those names like
`index123'.

So we try to go with Unicode:

\starttyping
ttf2afm -u -o pala.afm pala.ttf
\stoptyping

This time we get another bunch of warnings like:
\starttyping
Warning: ttf2afm (file pala.ttf): glyph 108 have multiple encodings (the
first one being used): uni0162 uni021A
.
.
\stoptyping

It is hard to understand what tfm2afm tells us by this message at the first
sight. So let's recap the connection between glyph name, glyph index and
unicode:
\startitemize[a,packed]
\item glyphs are identified internally by index.
\item <glyph-name> -> <glyph-index> is optional, and not always reliable.
    So is <glyph-index> -> <glyph-name>.
\item <unicode> -> <glyph-index> is (almost) always present and reliable.
\item <glyph-index> -> <unicode> is not always reliable, and it's not even
    a mapping, since there can be more unicodes that map to a glyph index.
    Therefore, given glyph index, it's not always possible to get an
    unicode corresponding to that index: there can be none, or more than
    one. If there is none, glyph index will be used (eg `index123'). If
    there are more, like in this case we have two unicodes 0162 021A that
    are mapped to glyph index 108. And we have asked ttf2afm to print glyph
    by unicode, so ttf2afm cannot know which value to print in this case.
    Hence it simply sticks with the first unicode and writes a warning.
\stopitemize

Now if all we want is to use pala.ttf with T1 encoding (and don't care
about ligatures), probably the easiest way is to create a new enc file
ec-uni.enc from ec.enc, where all glyph names are replaced by unicode. This
can be done easily for example by a Perl script that read the AGL (Adobe
Glyph List, available at
http://www.adobe.com/devnet/opentype/archives/glyphlist.txt and then
convert all glyph names to unicode). Assume that we already have
ec-uni.enc, the needed steps to create the tfm can look as:
\starttyping
ttf2afm -u -e ec-uni.enc -o pala-t1.afm pala.ttf
afm2pl pala-t1.afm
pltotf pala-t1.pl
\stoptyping

and to use the font:
\starttyping
\pdfmapline{+pala-t1 <ec-uni.enc <pala.ttf}
\font\f=pala-t1\f
This is a test of font Palatino Regular in T1 encoding.
\stoptyping

If we want to do more than just using pala.ttf with T1 encoding, for
example to process the afm output with fontinst for more a complex font
setup, then we must process slightly different. Having an afm file where
all glyph names are converted to `uniXXXX' form is not very useful for
fontinst. Instead, we need an afm file with AGL names to use with fontinst.
We do so by:

\startitemize[a,packed]
\item generate an afm  with glyph names in form `uniXXXX'
    \starttyping
    ttf2afm -u -o pala.afm pala.ttf
    \stoptyping
\item convert pala.afm to pala-agl.afm, so that pala-agl.afm contains AGL
    names only. Again, a simple Perl script can do that.
\item process pala-agl.afm by fontinst as needed.
\item in the final stage, when we already have the tfm's from fontinst and
    friends, plus the map entries (generated by fontinst, or created
    manually), we need to replace the encoding by their counterpart with
    `uniXXXX' names. For example, if fontinst tell us to add a line saying
    \starttyping
    pala-agl-8r <8r.enc <pala.ttf
    \stoptyping
    to our map file, then we need to change that line to
    \starttyping
    pala-agl-8r <8r-uni.enc <pala.ttf
    \stoptyping
    where 8r-uni.enc is derived from 8r.enc by converting all glyph names
    to the `uniXXXX' form.
\stopitemize