Math glyph names statistics

Tue, 12 Dec 2000 11:48:16 -0500

Hi, all.

The last few weeks I've spent some time (much too much, really) trying to
compile some kind of statistics on to what extent various sources which
assign glyph names to math glyphs use the same names or not. The results
can be found below.

What I've done more concretely is that I've tried to use the
fontinst-related files in Matthias Clasen's and Ulrik Vieth's mathfont
package [1] as a source of correspondences between glyphs from different
fonts: If glyph X from font A and glyph Y from font B appear under the same
name when the virtual fonts are written then they are considered to be
equivalent. (It's not a proper equivalence relation, but it's pretty close,
so in the following I will assume it is.) Each such equivalence class of
glyphs defines a partition (by what name the font uses for the glyph) of
the set of fonts which provide the glyph, e.g. if glyph X in font A, glyph
Y in font B, and glyph X in font C are equivalent then the corresponding
partition of the set of fonts is {A,C},{B} because A and C both use the
name X whereas B calls the corresponding glyph Y. In the notation below
this partition would be denoted as AC.B.

The sources of glyph names that I have compared are:
  Lucida fonts (denoted L in the table),
  Mathematica fonts (denoted M in the table),
  Adobe Symbol font (denoted S in the table),
  MathTime fonts (denoted T in the table), and
  the .enc files for the OM? encodings that come with dvips (denoted E in
    the table).
The various MetaFont math fonts that exist haven't been considered since
one should take the metrics for these from the PL files, and then one have
to assign names to them anyway. There are also some non-MetaFont math font
families (e.g. Pazo Math) which I haven't considered because mathfont
doesn't have installation scripts for them.

Anyway, when I count how many times a certain partition of the glyph name
sources occurs among the approx. 1300 equivalence classes of glyphs that
result, I get the table titled "Sorted by target name" below:

Sorted by target name:        Sorted by source name:
Cases  Partition              Cases  Partition
  657  M.                       997  M.
   93  EL.M.T.                  117  EL.
   68  EL.                       51  E.
   56  S.                        38  T.
   66  M.S.                      37  S.
   63  E.M.T.                    26  ELS.
   48  EL.M.                     25  EL.ELT.T.
   27  M.T.                      24  S.T.           <
   26  ELT.M.                    24  EL.T.          <
   21  E.                        24  EL.S.          <
   19  T.                        22  E.L.           <
   18  E.M.                      21  E.S.           <
   18  ELS.M.                    17  ES.
   14  EL.M.S.                   13  M.S.           <
   14  E.M.S.                    11  E.M.S.         <
   13  M.S.T.                     6  ELS.T.         <
   12  EL.T.                      2  E.LS.          <
   10  L.                         2  M.T.           <
   10  L.M.T.                     2  ELST.T.
    6  E.T.                       2  EL.L.
    5  EL.M.S.T.                  2  ELT.T.
    4  ES.M.                      1  EL.L.S.        <
    4  L.T.                       1  E.L.S.         <
    3  L.M.                       1  ES.S.T.        <
    2  EL.M.ST.                   1  ES.T.          <
    2  E.LT.M.                    1  L.LT.ST.       <
    2  ELT.                       1  E.EL.ELT.T.
    1  ES.L.M.                    1  E.EL.T.
    1  LS.M.
    1  L.M.S.
    1  M.ST.

In that table one can see that the Mathematica fonts never use the same
name for a glyph as any of the other fonts (when M appears in the table it
is always in a part of its own)---instead the glyph names there seem to be
the same names as used internally in the Mathematica program---and thus
glyph name aliasing seems to be the right approach for those fonts. About
the other sources one may notice that the Lucida fonts and the encoding
files use the same names for almost all glyphs they have in common, whereas
the Symbol and MathTime fonts uses their own names for some glyphs and the
same names as Lucida for others; in other words there is no clear pattern
for those.

The tabled titled "Sorted by source name" is an attempt to locate glyph
that would mess things up for a simple aliasing scheme (due to that two
fonts use the same name for two glyphs which are not equivalent). Here each
"case" is a glyph name used in some source and the set of sources are
partitioned according to whether the glyphs are considered to be
equivalent. What one would like to see here is that the partitions only
have one part, because that means that the glyph name alone uniquely
identifies the glyph, but due to the sloppy way in which "glyph
equivalence" has been defined this isn't necessarily what one sees. Many of
the "partitions" aren't proper partitions, but merely families of sets and
I believe this is mainly due to that the same glyph is given several
different names by the MTX files (cf. Eng and Ng in latin.mtx), so those
"partitions" where there is one part which contains all the others are
probably OK as well. Those partitions that probably aren't OK have been
marked by a < to the right; the number of glyph names which are affected by
this is 154 (if I rewrite my scripts a bit I probably could find out which
these glyph names are as well). Some examples are the _lower_case_ greek
letters, which the Mathematica fonts name Alpha, Beta, Gamma, ... whereas
most other fonts use Gamma for the upper case letter.

Now, are there any known sources of errors in the above data? There are
several. To begin with, the glyph names in many AFMs for math fonts are
rubbish, but it seems the foundries have improved in this respect; the
first set of AFMs for the Mathematica fonts I found came with Mathematica
3, were made 1996, and the glyph names had noting to do with what the
glyphs looked like, but in the AFMs that came with Mathematica 4 (made
1999) the glyph names made sense. Another major source of error is that the
Lucida and MathTime AFMs that I've used neither seem to cover all the fonts
in these families nor are particularly up to date (there is a certain
amount of rubbish names in them), so those families probably look worse in
the above tables than they really should. I've tried to find better AFMs,
but Y&Y doesn't seem to have them available for download. :-( (In the event
that anyone sends me better AFMs, I would of course redo the statistics
with the corrected data.) Finally, the method used to determine whether
glyphs are equivalent is not good, what one really should do is compare the
glyphs themselves, but since I haven't got all the fonts that haven't been
able to do that.

Still hoping that the above might be useful,
Lars Hellström

Reference(s):
[1] The mathfont package (or system, or whatever) is a collection of files
    for making, installing, and using math fonts with the "new" 8-bit math
    font encodings (M?? encodings). Some of it are MetaFont sources, but
    the part I've been using are the fontinst files that build virtual
    fonts from various (mostly commercial) fonts with other encodings
    than the M?? encodings.

    As far as I can tell there hasn't been any development on mathfont
    since the 1998 version 0.59 (does anyone have more precise
    information?). mathfont is not on CTAN, but it can be downloaded
    from the file archive of the TeX Extended Mathematics Font
    Encoding Working Group at http://tug.org/twg/mfg/archives/.