[XeTeX] Latin Modern, from TFM to Unicode

Khaled Hosny khaledhosny at eglug.org
Tue Jun 11 21:31:51 CEST 2013

On Mon, Jun 10, 2013 at 10:04:42PM -0600, Doug McKenna wrote:
> This font has many thousands of glyphs in it.  After choosing the 
> "Preview > Repertoire" menu choice in FontBook, one can scroll through 
> all the glyphs and eventually find glyph #2500, identified by Unicode 
> name "U+2398 LEFT PARENTHESIS UPPER HOOK".  And glyph #2499 just before 
> it in the list appears to be the little vertical piece to use repeatedly 
> to build a vertically extensible left parenthesis with that upper hook at 
> the top.  That repeatable vertical element's Unicode identification is 
> "U+239C LEFT PARENTHESIS EXTENSION".  Okay, that makes sense.
> But then, a few glyphs later, there's another version of the top of an 
> extensible left parenthesis, Glyph #2506, which is slightly less curved.  
> But it has no associated Unicode description to it that FontBook shows.  
> I don't know why, or what the criteria might be that distinguishes when 
> it might be used as oppposed to the glyph for the official Unicode code 
> point.  The next few subsequent glyphs, up to and including glyph #2511 
> also have no Unicode names displayed.  Then glyph #2512 has an official 

The mapping between regular glyphs (e.g. normal parenthesis) to its
different sizes or extensible recipe is based solely on glyph ids in the
font, so those glyph do not need to have a Unicode code point. Some of
the parts of extensible glyphs happen to have Unicode code points and
the font developer can choose to use them or use ununecoded glyphs, some
do not have Unicode code points at all and thus unencoded glyphs are
always used.

> (A separate 
> question: I wasn't aware that an OpenType font even knew anything
> about official Unicode code point names, so how does FontBook know
> which glyphs have Unicode names and which ones don't??)

OpenType fonts have a 'cmap' table that maps between Unicode code points
and internal glyph ids, "encoded" glyphs are the ones covered by the
'cmap' table, the rest are unencoded glyphs.

> With regards to the OpenType font "latinmodern-math.otf" that I've 
> installed, I desire to know, for all 128 glyph metrics represented by 
> "lmex10.tfm", what the official Unicode character code points are for the 
> glyphs that have those metrics in that TFM file.  Is this documented 
> anywhere, either in text form or binary in a publicly available file as 
> part of TeX, XeTeX, ...?

Most glyphs in lmex10.tfm are larger variants of other glyphs and thus
there are no "official Unicode code points" for them (apart from the
code points that map to their base glyphs).

> I'm actually interested in the answer for all the Latin Modern TFM files, 
> to the extent that they map to Unicode code points in the OpenType Latin 
> Modern files, but presumably the answer to this one math extension font 
> file will prove useful for answering all.
> I understand that there's a CMAP table in the OpenType font with a 
> Unicode encoding sub-table that maps between official Unicode code points 
> and glyph IDs.  It's getting from the 128 "code points" in the TFM files 
> to the actual Unicode code points that I'm interested in.

I'm not sure what do you want to achieve, and you might be asking the
wrong question, so it might be better to elaborate more on your actual goal.


More information about the XeTeX mailing list