[XeTeX] Latin Modern, from TFM to Unicode
doug at mathemaesthetics.com
Wed Jun 12 21:32:10 CEST 2013
Thanks for all the responses.
I understand the distinction between Unicode characters (code points) and
glyphs, and that an OpenType font can have glyphs in it that do not
correspond to any Unicode code points. I don't quite get whether or how
those non-Unicode glyphs are subject to being found via the 'cmap' table,
or whether they have glyph IDs that are known or can be determined by
some documented convention outside the OpenType font file. Or whether
they are part of some internal ligature-like structure that only the
OpenType font has information about (which might mean that the glyph IDs
can change internally from one release to the next of the OT font).
Arthur Reutenauer responded:
> These glyphs or parts of glyphs can probably be mapped one-to-one to font
slots in the
> original lmex10, but that does not make them characters.
Understood about not being characters. But it's that one-to-one mapping
from each slot in TFM to an equivalent slot in OpenType (for Latin
Modern) I'm interested in pinning down (hopefully not "probably"). It
certainly appears that every glyph represented by "lmex10.tfm" can be
found in the "Latin Modern Math" font file, though I haven't gone through
all 128 trying to find where they appear in the OT font.
Khaled Hosny wrote:
> [snip numerous good explanations]
Thanks. I understand better what's going on inside the OpenType font,
and can now imagine how FontBook is figuring out which glyphs are not the
targets of the 'cmap' table's Unicode code point inputs. And I
understand that the math extension font contains glyphs for different
sizes of the same symbol, but kept in different slots with different
glyph indices (if that's the right term) in the TFM file.
> I"m not sure what do you want to achieve, and you might be asking the wrong
> so it might be better to elaborate more on your actual goal.
I have my own homebrew math layout system that determines where to place
math glyphs based on information in the lmex10.tfm and other TFM files.
For reasons peculiar to my needs, I'm not interested in creating PDF or
DVI output. I just want to draw a math glyph on my screen using "Latin
Modern Math" at a computed position, based on where TeX would place it
using the metrics in "lmex10.tfm" or other TFM file (the extent to which
I'm accurately simulating TeX is a side-issue, but I'm trying hard). My
assumption was that the glyphs in the OT file are the visually the same,
and have the same metrics/bounding boxes, etc. as the original TFM
metrics. Or if they don't have quite the same metrics, the differences
are not going to change over time with new versions of the OT font.
I assumed that every one of the 128 glyphs represented by slots in
lmex10.tfm would be found in the OpenType font "Latin Modern Math", along
with lots of other glyphs. I had thought that all the glyphs in the OT
font had Unicode character designations, but have now understood that
that is not a good assumption.
Consider the radical sign. In the TFM file, there is information that
TeX uses to determine which final glyph(s) to use, based on the height of
the box of whatever's underneath the radical. So TeX chooses the glyph
in slot "70 for small height, or the glyph in slot "71 for medium height,
or the one in slot "72 for large height, or slot "73 for even larger
height. If none of those fixed-height glyphs are high enough, presumably
TeX goes into a tall symbol construction algorithm based on data within
the TFM file, using glyphs representing pieces of radical signs, kept in
slots "74, "75, and "76.
Using FontBook, in the "Latin Modern" OpenType file, the glyph for the
official Unicode code point U+221A SQUARE ROOT is glyph ID #2839. So
that's a "character" I suppose. The 'cmap' table maps that Unicode value
to that glyph ID and it can be drawn as a character would. But there are
also non-Unicode glyphs for partial radical signs, all of which look
identical to the glyphs shown by /fonttable for "lmex10.tfm" (which are
taken from some PFB file). In particular, I've figured out by inspection
the following partial answer to what I'm interested in:
small radical TFM slot "70 ==> OTF glyph #2843 (no Unicode designation)
medium radical TFM slot "71 ==> OTF glyph #2844 (no Unicode designation)
large radical TFM slot "72 ==> OTF glyph #2845 (no Unicode designation)
larger radical TFM slot "73 ==> OTF glyph #2846 (no Unicode designation)
radical bottom TFM slot "74 ==> OTF glyph #2840 (U+23B7 RADICAL SYMBOL
vertical bar TFM slot "75 ==> OTF glyph #2841 (no Unicode
top corner TFM slot "76 ==> OTF glyph #2842 (no Unicode
So given that there are partial glyphs useful for building very large
radical signs in "Latin Modern Math", and given that most, though not all
of them, have no official Unicode code point assigned to them, how does
an outside process that wants to use the OT font to draw a very large
radical sign tell the font what to draw. Since there's no mapping from
Unicode, then the outside process either needs to know the absolute glyph
IDs inside the font, or it needs to cause the font to go into some
internal construction mode, like building a ligature, where the font
itself knows the sequence and position of the glyphs to use to construct
the tall symbol. The latter seems impossible, because the font can't
know the threshold height at which to stop construction. The former
means hard coding internal glyph IDs somewhere outside the font, which
I'm hoping is not fragile, but worrying might be.
Sorry for the reams of details, but I'm trying to be explain my confusion
More information about the XeTeX