[XeTeX] Incorrect rendering of Vedic Sanskrit accents

David M. Jones dmj at dmj.ams.org
Sat May 23 02:46:57 CEST 2015

> Date: Fri, 22 May 2015 22:52:24 +0200
> From: Zdenek Wagner <zdenek.wagner at gmail.com>

> The requirement of the Indic specification is to display the dotted
> circle if the mark cannot be combined.

Aha!  Thank you the pointer.  I assume you're referring to this?


Based purely on the text, the situation is still a bit murky, though.
Most seriously, the Indic specification is based on Unicode 3.1 and if
everything in that section is meant to be normative, it's badly
out-of-date with respect to more recent versions of Unicode.  For one
thing, it recommends attaching standalone combining marks to a space,
but Unicode now recommends U+00A0 NO-BREAK SPACE for that purpose.

More to the point, the Indic specification says

    Uniscribe displays these marks using the fallback rendering
    mechanism defined in the Unicode Standard (section 5.12,
    'Rendering Non-Spacing Marks' of the Unicode Standard 3.1),
    i.e. positioned on a dotted circle.

First, this is only describing how Uniscribe handles this situation;
its not clear that makes this behaviour a normative part of the Indic
script specification.

Second, that is no longer what Unicode recommends as the default
fallback rendering in this situation:

    In a degenerate case, a nonspacing mark occurs as the first
    character in the text or is separated from its base character by a
    line separator, paragraph separator, or other format character
    that causes a positional separation. This result is called a
    defective combining character sequence (see Section 3.6,
    Combination). Defective combining character sequences should be
    rendered as if they had a no-break space as a base character. (See
    Section 7.9, Combining Marks.)

        page 221.  (This wording goes back at least as far as Unicode
        5.0, where it occurs at the bottom of page 173.  Alas, I no
        longer have a copy of Unicode 3.0 at home, so I can't check
        the exact working used in it.)

On the other hand, as enjoyable as it is to play language lawyer with
the Unicode specification, I'm happy to concede the point that I
should just precede isolated characters by U+00A0 and everything will
be ok.  I'm much more vexed by the malfunctioning Vedic accents.  I
live in hope that that can be fixed so I don't have to throw away my
TECkit transliteration engine and start anew with luaTeX.


More information about the XeTeX mailing list