[XeTeX] Latin Modern, from TFM to Unicode

Adam Twardoch (List) list.adam at twardoch.com
Wed Jun 12 20:57:45 CEST 2013


if you think of the TFM slot indices as "glyph indices" rather than 
"character codes", then possibly, you can find a 1:1 mapping of all TFM 
indices to glyph IDs in the OTF. But not to Unicode codepoints. If your 
method of drawing glyphs on screen allows you to address glyph IDs 
directly (e.g. using FreeType or other such library which allows 
low-level addressing of glyph IDs within an OTF font file), then you 
should be able to achieve it.

However -- I personally don't know which glyphs have this correspondence 
or whether the ones in the OTFs have the same repertoire or metrics. 
You'd probably be best to contact the GUST e-foundry project members 

Unfortunately, apart from http://www.gust.org.pl/contact-info I don't 
see any easy way to contact them using public channels.


On 13-06-12 21:32, Doug McKenna wrote:
> Thanks for all the responses.
> I understand the distinction between Unicode characters (code points) and
> glyphs, and that an OpenType font can have glyphs in it that do not
> correspond to any Unicode code points.  I don't quite get whether or how
> those non-Unicode glyphs are subject to being found via the 'cmap' table,
> or whether they have glyph IDs that are known or can be determined by
> some documented convention outside the OpenType font file.  Or whether
> they are part of some internal ligature-like structure that only the
> OpenType font has information about (which might mean that the glyph IDs
> can change internally from one release to the next of the OT font).
> Arthur Reutenauer responded:
>> These glyphs or parts of glyphs can probably be mapped one-to-one to font
> slots in the
>> original lmex10, but that does not make them characters.
> Understood about not being characters.  But it's that one-to-one mapping
> from each slot in TFM to an equivalent slot in OpenType (for Latin
> Modern) I'm interested in pinning down (hopefully not "probably").  It
> certainly appears that every glyph represented by "lmex10.tfm" can be
> found in the "Latin Modern Math" font file, though I haven't gone through
> all 128 trying to find where they appear in the OT font.
> Khaled Hosny wrote:
>> [snip numerous good explanations]
> Thanks.  I understand better what's going on inside the OpenType font,
> and can now imagine how FontBook is figuring out which glyphs are not the
> targets of the 'cmap' table's Unicode code point inputs.  And I
> understand that the math extension font contains glyphs for different
> sizes of the same symbol, but kept in different slots with different
> glyph indices (if that's the right term) in the TFM file.
>> I"m not sure what do you want to achieve, and you might be asking the wrong
> question,
>> so it might be better to elaborate more on your actual goal.
> I have my own homebrew math layout system that determines where to place
> math glyphs based on information in the lmex10.tfm and other TFM files.
> For reasons peculiar to my needs, I'm not interested in creating PDF or
> DVI output.  I just want to draw a math glyph on my screen using "Latin
> Modern Math" at a computed position, based on where TeX would place it
> using the metrics in "lmex10.tfm" or other TFM file (the extent to which
> I'm accurately simulating TeX is a side-issue, but I'm trying hard).  My
> assumption was that the glyphs in the OT file are the visually the same,
> and have the same metrics/bounding boxes, etc. as the original TFM
> metrics.  Or if they don't have quite the same metrics, the differences
> are not going to change over time with new versions of the OT font.
> I assumed that every one of the 128 glyphs represented by slots in
> lmex10.tfm would be found in the OpenType font "Latin Modern Math", along
> with lots of other glyphs.  I had thought that all the glyphs in the OT
> font had Unicode character designations, but have now understood that
> that is not a good assumption.
> Consider the radical sign.  In the TFM file, there is information that
> TeX uses to determine which final glyph(s) to use, based on the height of
> the box of whatever's underneath the radical.  So TeX chooses the glyph
> in slot "70 for small height, or the glyph in slot "71 for medium height,
> or the one in slot "72 for large height, or slot "73 for even larger
> height.  If none of those fixed-height glyphs are high enough, presumably
> TeX goes into a tall symbol construction algorithm based on data within
> the TFM file, using glyphs representing pieces of radical signs, kept in
> slots "74, "75, and "76.
> Using FontBook, in the "Latin Modern" OpenType file, the glyph for the
> official Unicode code point U+221A SQUARE ROOT is glyph ID #2839.  So
> that's a "character" I suppose.  The 'cmap' table maps that Unicode value
> to that glyph ID and it can be drawn as a character would.  But there are
> also non-Unicode glyphs for partial radical signs, all of which look
> identical to the glyphs shown by /fonttable for "lmex10.tfm" (which are
> taken from some PFB file).  In particular, I've figured out by inspection
> the following partial answer to what I'm interested in:
> small radical    TFM slot "70 ==> OTF glyph #2843 (no Unicode designation)
> medium radical   TFM slot "71 ==> OTF glyph #2844 (no Unicode designation)
> large radical    TFM slot "72 ==> OTF glyph #2845 (no Unicode designation)
> larger radical   TFM slot "73 ==> OTF glyph #2846 (no Unicode designation)
> radical bottom   TFM slot "74 ==> OTF glyph #2840 (U+23B7 RADICAL SYMBOL
> vertical bar     TFM slot "75 ==> OTF glyph #2841 (no Unicode
> deisignation)
> top corner       TFM slot "76 ==> OTF glyph #2842 (no Unicode
> deisignation)
> So given that there are partial glyphs useful for building very large
> radical signs in "Latin Modern Math", and given that most, though not all
> of them, have no official Unicode code point assigned to them, how does
> an outside process that wants to use the OT font to draw a very large
> radical sign tell the font what to draw.  Since there's no mapping from
> Unicode, then the outside process either needs to know the absolute glyph
> IDs inside the font, or it needs to cause the font to go into some
> internal construction mode, like building a ligature, where the font
> itself knows the sequence and position of the glyphs to use to construct
> the tall symbol.  The latter seems impossible, because the font can't
> know the threshold height at which to stop construction.  The former
> means hard coding internal glyph IDs somewhere outside the font, which
> I'm hoping is not fragile, but worrying might be.
> Sorry for the reams of details, but I'm trying to be explain my confusion
> exactly.
> Doug McKenna
> --------------------------------------------------
> Subscriptions, Archive, and List information, etc.:
>    http://tug.org/mailman/listinfo/xetex


May success attend your efforts,
-- Adam Twardoch
(Remove "list." from e-mail address to contact me directly.)

More information about the XeTeX mailing list