[XeTeX] Microtypography?
Jonathan Kew
jonathan_kew at sil.org
Thu May 11 00:29:56 CEST 2006
On 10 May 2006, at 10:09 pm, Bruno Voisin wrote:
> Le 10 mai 06 à 21:48, Jonathan Kew a écrit :
>
>> So the whole paragraphing algorithm
>> will have to be much more aware of the two-level character/glyph
>> model than it is at present; right now, XeTeX works in terms of
>> characters, and the details of individual glyphs are largely hidden
>> by the rendering technology (ATSUI or ICU/OpenType).
>
> In case that can be expressed in not too technical terms: what's the
> difference between character and glyph? Character a logical unit (a
> Unicode code point, maybe) and a glyph a physical unit (ie a set of
> pen strokes)? It's not that important probably, but I'm just
> interested in understanding what you mean there.
Yes, a 'character' is a logical unit in the encoded representation of
textual data; in this context, a Unicode code value such as <U+0061
LATIN SMALL LETTER A> or <U+0628 ARABIC LETTER BEH>.
A 'glyph' is a visual representation of a character (or not
necessarily "a character"....see below). So for the same Unicode
*character* U+0061, there may be many quite different *glyphs*,
depending on the type design (compare Times Roman 'a' and Avant Garde
'a').
In Latin script, there is a (largely) one-to-one mapping from
character to glyph in any given font, which explains the lack of a
clear character/glyph model in TeX (and other software of that era).
We have instances such as "fi", where the two characters <f, i> may
be represented by a single <fi> glyph (ligature), but they are only a
few; TeX essentially treats the ligature as another character, and
performs the ligature processing in "character space".
A cursive font such as Zapfino stretches this much further; there may
be dozens of 'a' glyphs in Zapfino, all representing the *same*
character. The selection of alternate *glyphs* depending on context
in the word, or on stylistic preferences ("I want lots of
flourishes", or "I want small capitals") should not involve changes
to the *characters* of the underlying encoded text; they represent
replacements of the *glyphs* used to render those characters.
In a script such as Arabic, a letter such as BEH will have very
different forms depending whether it occurs at the beginning, middle,
or end of a word. But in all cases, it is the same *character* <U
+0628>; choosing the proper contextual *glyph* is the responsibility
of the font rendering subsystem, when presented with a stream of
character codes to be rendered in a particular font.
So text input, and text-processing tasks such as locating possible
line-break positions, hyphenation points, etc., are all performed in
terms of *characters*. But characters have no visual form or metrics;
only *glyphs* have these. And the mapping between a character
sequence and a glyph sequence may be a very complex one, especially
in some Asian scripts. And so, although XeTeX is dealing with
typesetting text (a sequence of encoded characters), it cannot do
anything that involves metrics on a per-character basis.
(Look up the "character/glyph model" online if you want to read lots
more!)
JK
More information about the XeTeX
mailing list