[XeTeX] Latin Modern, from TFM to Unicode

Doug McKenna doug at mathemaesthetics.com
Tue Jun 11 06:04:42 CEST 2013

All -

I don't know really which TeX-related list this should be posted to, so 
I'm starting with this one, as it has dealt with OpenType fonts.

This is a TFM <==> OpenType math font question.

By using the "fonttable" package (under TexLive 2010, LaTeX2e), I'm able 
to create a PDF that shows the glyphs for all 128 code points 
(characters) for the font file "lmex10.tfm" (Latin Modern Math Extension 
10 pt font).  The LaTeX code to do this is simple:

\documentclass{article}  % or whatever

The actual glyphs are presumably magically incorporated by pdftex (or 
some variant) into the final output file using the associated binary 
printer font file "lmex10.pfb", although I'm not really clear about how 
all that works.  Whatever, that's a separate issue.

For instance, glyph (48 decimal, "30 hex, '060 octal) from "lmex10.tfm" 
is the upper piece of a vertically extensible left parenthesis.  It 
appears that either glyph 66 (decimal) or 67 (decimal) is the repeatable 
element making up the vertical extension to build that tall left 
parenthesis, though I can't be certain because there's some other 
vertical extensions nearby in the table.

I'm on a Mac (Mac OS X 10.7.5), and I've installed a recent version of 
the OpenType Latin Modern Math font ("latinmodern-math.otf") in my 
"~/Library/Fonts/..." directory, and have looked at it using FontBook.

This font has many thousands of glyphs in it.  After choosing the 
"Preview > Repertoire" menu choice in FontBook, one can scroll through 
all the glyphs and eventually find glyph #2500, identified by Unicode 
name "U+2398 LEFT PARENTHESIS UPPER HOOK".  And glyph #2499 just before 
it in the list appears to be the little vertical piece to use repeatedly 
to build a vertically extensible left parenthesis with that upper hook at 
the top.  That repeatable vertical element's Unicode identification is 
"U+239C LEFT PARENTHESIS EXTENSION".  Okay, that makes sense.

But then, a few glyphs later, there's another version of the top of an 
extensible left parenthesis, Glyph #2506, which is slightly less curved.  
But it has no associated Unicode description to it that FontBook shows.  
I don't know why, or what the criteria might be that distinguishes when 
it might be used as oppposed to the glyph for the official Unicode code 
point.  The next few subsequent glyphs, up to and including glyph #2511 
also have no Unicode names displayed.  Then glyph #2512 has an official 
Unicode name (U+23A9 LEFT CURLY BRACKET LOWER HOOK).  (A separate 
question: I wasn't aware that an OpenType font even knew anything about 
official Unicode code point names, so how does FontBook know which glyphs 
have Unicode names and which ones don't??)

With regards to the OpenType font "latinmodern-math.otf" that I've 
installed, I desire to know, for all 128 glyph metrics represented by 
"lmex10.tfm", what the official Unicode character code points are for the 
glyphs that have those metrics in that TFM file.  Is this documented 
anywhere, either in text form or binary in a publicly available file as 
part of TeX, XeTeX, ...?

I'm actually interested in the answer for all the Latin Modern TFM files, 
to the extent that they map to Unicode code points in the OpenType Latin 
Modern files, but presumably the answer to this one math extension font 
file will prove useful for answering all.

I understand that there's a CMAP table in the OpenType font with a 
Unicode encoding sub-table that maps between official Unicode code points 
and glyph IDs.  It's getting from the 128 "code points" in the TFM files 
to the actual Unicode code points that I'm interested in.

All manner of searching for the answer to this simple question has so far 
proved elusive.  

Thanks in advance for any non-elusive elucidations.

Doug McKenna

More information about the XeTeX mailing list