glyph names for accents

Lars Hellström Lars.Hellstrom@math.umu.se
Sat, 1 Sep 2001 18:34:04 +0200


At 18.49 +0200 2001-08-31, Vladimir Volovich wrote:
>btw, here is a fix for typos in t1draft.etx:
[snip]

Thanks!

>and i also have a few questions:
>
>1) for accent characters in the range 0-12 you use unicode values from
>the "combining diacritical marks" range, which means e.g. that
>character 0 from T1 encoding should be named 'gravecomb' (U0300)
>rather than 'grave' (U0060).
>
>i'd like to know whether this is the right approach? i ask because all
>existing *.enc files seem to use non-combining glyph names:
>  /grave /acute /circumflex /tilde /dieresis /hungarumlaut
>  /ring /caron /breve /macron /dotaccent /cedilla /ogonek

I actually hadn't noticed that the Adobe Glyph List says gravecomb for
U+0300, but don't think their assignment is of much relevance here. One
must realise that there isn't a 1-1 correspondance between Unicode code
points and glyph names, although the AGL seems to strive for such a
correspondance and thus invents 'gravecomb' since 'grave' is already used.

In my opinion Subsection 3.1 of encspecs.tex is quite clear on the matter:
   "From the point of formal specification, the [choice of names for glyphs]
   can be completely arbitrary, but from the point of practical usefulness
   they most likely are not. [...] the glyph names are best chosen to be
   the ones one can expect to find in actual fonts, as that will make
   things easier for other people that want to make non-experimental
   implementations later."
The identification of characters comes first, and there slot 0 is clearly
U+0300. The glyph names come last, and the usual name for the grave accent
is 'grave'.

>2) you define compound word mark character as U200C;
>   is this definitely the correct assignment? (maybe it is U200D or U200B?)

U+200B is a space character, even though of width zero. As I understand it,
it is equivalent to \hskip\z@skip. The primary funktion of both U+200C and
compwordmark is to prevent ligatures from being formed. In the extent that
U+200D does anything in latin text, it _requests_ that a ligature is
formed. All this is described in Section 13.2 of the Unicode standard.

I'm not sure U+200C is definitely the same thing as compwordmark, but it is
the closest equivalent I've been able to find.

Lars Hellström