[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Unicode and composite characters
Berthold wrote --
> correct me if I'm wrong, but the unicode people have defined the
> lonely accent circumflex? & it's not a linguistic glyph
> (never used for itself)
> The UNICODE people - like any good committee - are not of one mind on this.
> On the one hand they explicitly state (at least in earlier versions)
> that the `non spacing' accents are provided for constructing composites,
> yet also insist on listing all composites that actually occur.
> There are also `spacing' accents, by the way, whatever that is. And
> the non-spacing ones are non-sense since they are meant to accent the
> character that comes before - nobody has though about the spacing /
> kerning issues.
That is because the details of spacing and kerning are properties
related to typography and glyphs and as such are explicitly and wisely
not the concern of a character encoding.
At an abstract level, the reason why the Unicode standard needs to
contain, and to define the _use_ of, what it calls "non-spacing marks"
and "combining characters" is described in Sections 3.9 of 5.9 of The
Unicode Standard Version 2.9 (A-W 1996).
Particular applications that make use of the decomposed forms are
sorting and searching (see section 5.15); these are (normally, at
least) independent of typographic conventions and glyphs and are the
concern of a character encoding.
Please do not interpret my defence of Unicode as meaning that I think
that Unicode "gets it all right"; it certainly does not! But the
existence of a dotless j (or i, or anything else) in Unicode is not
closely related to whether a font needs to contain this glyph: it is
only relevant to whether applications concerned only with characters