[XeTeX] Incorrect rendering of Vedic Sanskrit accents
David M. Jones
dmj at dmj.ams.org
Sat May 23 02:46:57 CEST 2015
> Date: Fri, 22 May 2015 22:52:24 +0200
> From: Zdenek Wagner <zdenek.wagner at gmail.com>
> The requirement of the Indic specification is to display the dotted
> circle if the mark cannot be combined.
Aha! Thank you the pointer. I assume you're referring to this?
Based purely on the text, the situation is still a bit murky, though.
Most seriously, the Indic specification is based on Unicode 3.1 and if
everything in that section is meant to be normative, it's badly
out-of-date with respect to more recent versions of Unicode. For one
thing, it recommends attaching standalone combining marks to a space,
but Unicode now recommends U+00A0 NO-BREAK SPACE for that purpose.
More to the point, the Indic specification says
Uniscribe displays these marks using the fallback rendering
mechanism defined in the Unicode Standard (section 5.12,
'Rendering Non-Spacing Marks' of the Unicode Standard 3.1),
i.e. positioned on a dotted circle.
First, this is only describing how Uniscribe handles this situation;
its not clear that makes this behaviour a normative part of the Indic
Second, that is no longer what Unicode recommends as the default
fallback rendering in this situation:
In a degenerate case, a nonspacing mark occurs as the first
character in the text or is separated from its base character by a
line separator, paragraph separator, or other format character
that causes a positional separation. This result is called a
defective combining character sequence (see Section 3.6,
Combination). Defective combining character sequences should be
rendered as if they had a no-break space as a base character. (See
Section 7.9, Combining Marks.)
page 221. (This wording goes back at least as far as Unicode
5.0, where it occurs at the bottom of page 173. Alas, I no
longer have a copy of Unicode 3.0 at home, so I can't check
the exact working used in it.)
On the other hand, as enjoyable as it is to play language lawyer with
the Unicode specification, I'm happy to concede the point that I
should just precede isolated characters by U+00A0 and everything will
be ok. I'm much more vexed by the malfunctioning Vedic accents. I
live in hope that that can be fixed so I don't have to throw away my
TECkit transliteration engine and start anew with luaTeX.
More information about the XeTeX