[texhax] TFMs for Unicode (WAS Re: Font help needed.)

Pierre MacKay pierre.mackay at comcast.net
Fri Apr 20 01:59:16 CEST 2007

Since some of the answers on this thread seem to connect with the 
problem of providing TFMs for Unicoded fonts, I would like to float the 
following proposed addition to the font naming scheme. 

It is going to be increasingly necessary to provide sets of TFMs to 
match large Unicoded sets of glyphs.  For this purpose we ought to 
borrow from Unicode organization and nomenclature.  When a TFM is 
created for one of the patgs above page U+00xx
let us simply say so, using the U (uppercase) and the hex page number 
(it might be even better if we could use U+01, etc. but including an 
arithmetic operator in a system-independent file-name is probably a bad 
idea.)  Unlike a lot of the gargantuan (and usually ephemeral) 
featuritis that presently afflicts font technology, Unicode font pages 
are not going to go away and, through the really brilliant coding of 
UTF8, they will be accessible to those of us who refuse to be dragooned 
into Vista for a long time to come.

If bchr designates the U+00 page of Bitstream Charter (I think it does, 
I don't happen to have used Charter recently) then bchr*U01 will always 
designate the Latin-1 set of glyphs and bchr*U03 will designate either 
COMBINING diacriticals (which will never be of much direct interest to 
TeX users owing to suboptimal spacing) or monotonic Greek or both.  
Pages like U+03 leave an irreducible chance of ambiguity but, in 
real-world usage it is unlikely to cause much trouble.  At the worst, 
some Unicode pages might have to be subdivided into slices (U030 vs U037 
for instance, or U03a vs U03b)

In an earlier message I showed how UTF8 could easily be read with a 
simple package of plain TeX macros.  The intermediate output of this 
package is one count register containing the page number, and one 
containing the glyph number on that page.
The page number register can easily be put to use to generate the 
appropriate TFM name by catenating it onto the old fontname No special 
memorization of fontname categories will be needed, and the glyph number 
will always be in the range 0--255.  

Even CJK can be handled this way.  The apparent size of a CJK repertory 
is daunting, but in the few cases when I have had to consider CJK 
setting, I have not been surprised to see that any real-world document 
other than a dictionary is likely to use a manageable subset of the 
whole range.  The plain TeX macros mentioned above do not require 
complete set, or even a continuous set of  "subunits" to extract the 
required Unicode pages.  Only the ones that correspond with the group of 
documents you happen to be setting need to be provided.. 

All of the above can be done with trip-tested TeX3.14195n, and it means 
that we can retain the full power of DEK's TFM and VF  programming 
unchanged.  I have the old-fashioned sense that that is really rather a 

Pierre MacKay

Karl Berry wrote:
>     The problem is with fonts where the tfm name is not the 
>     same as the pfb name.
> This is, as you note, very common.  That is how we handle the TeX 256
> chars per encoding limitation -- OT1, T1, texnansi, etc., each have to
> have their own tfm, even if they all map to a single pfb.
> Indeed, the Cyrillic font tfm's are generated by mktextfm.  There are
> scripts in the lh distribution to do it all in batch if you're so
> inclined.  Probably not very interesting for purposes of a font
> catalogue, in any case.  (It could be interesting if each font had the
> supported encodings listed, though.)
> karl
> _______________________________________________
> TeX FAQ: http://www.tex.ac.uk/faq
> Mailing list archives: http://tug.org/pipermail/texhax/
> More links: http://tug.org/begin.html
> Automated subscription management: http://tug.org/mailman/listinfo/texhax
> Human mailing list managers: postmaster at tug.org

More information about the texhax mailing list