[XeTeX] Bangla font question
David J. Perry
hospes.primus at verizon.net
Fri Mar 11 01:23:51 CET 2011
Sounds like a font problem. I don't know Bangla, but any properly designed
font should render correctly any combinations of characters that ordinarily
appear in the language(s) it is designed to support. Implementing such
complex substitutions as you describe is difficult for font developers, so
we can sympathize a bit, but nonetheless they should work. I suppose there
is a small chance that the ICU renderer does not do something correctly when
dealing with Bangla, but it's more likely the font. Does the font work
correctly outside of XeTeX?
----- Original Message -----
From: "maxwell" <maxwell at umiacs.umd.edu>
To: "Unicode-based TeX for Mac OS X and other platforms" <xetex at tug.org>
Sent: Thursday, March 10, 2011 6:59 PM
Subject: [XeTeX] Bangla font question
> We're publishing a grammar of Bangla, which uses the Bengali script block
> of Unicode. We're running into a problem with the appearance of certain
> vowel characters, which are supposed to appear to the *left* of the
> consonant that they're pronounced after. These include U+09BF, U+09C7 and
> U+09C8. (U+09CB and U+09CC are similar.) (Those of you who studied
> transformational grammar may be reminded of the "affix hop
> Normally this works just fine. The display rules are somewhat complex,
> because the Bangla writing system is one of those that has a default
> Specifically, a consonant letter which is not followed by an overt vowel
> sign in the writing is assumed to be followed by the default vowel in
> speech. If a consonant is *not* followed by a vowel in speech, i.e. if it
> is followed by another consonant (i.e. it's the first consonant in a
> consonant cluster), then you're supposed to put a special virama (or
> hashanta) mark under the consonant--a diacritic to indicate that there's
> vowel following.
> When a consonant + virama appears at the end of the word, the virama would
> appear overtly. In the rendering of Unicode text, a consonant + virama +
> consonant is often replaced on-screen or in print by a conjunct consonant,
> which is a kind of double consonant (analogous to English x = ks, but
> composed of pieces of the two consonant characters in Bangla). Not all
> fonts have all conjunct consonants, and when a font lacks a particular
> conjunct, the expected representation on-screen or in print is generally
> the underlying representation, i.e. consonant + virama + consonant.
> There is one exception to the contraction of consonant + virama +
> consonant into conjunct consonant, and that is when there's a morpheme
> boundary between the two consonants (i.e. the first consonant is in the
> stem, and the second consonant is in a suffix). In this case, the expected
> appearance on-screen or in print would be consonant + virama + consonant,
> i.e. what you'd get if the font didn't have a conjunct consonant. In order
> to force this behavior, Unicode uses a ZWNJ (Zero Width Non-Joiner); the
> underlying sequence
> consonant + virama + ZWNJ + consonant
> is output as
> consonant + virama + consonant
> rather than as a two-consonant conjunct.
> If one of these vowels that hops leftward (U+09BF, U+09C7 and U+09C8) is
> preceded by a conjunct consonant (underlyingly a sequence of consonant +
> virama + consonant), then the vowel hops leftward over the conjunct.
> So far, so good.
> However, a problem arises when consonant clusters occur across morpheme
> boundaries *and* the second consonant is followed by one of the vowel
> that is supposed to appear to the left of the consonant it's pronounced
> after. In this case, we're told that the vowel sign should appear
> *between* the two consonants, rather than to the left of both consonants.
> In other words, the underlying sequence
> consonant + virama + ZWNJ + consonant + vowel
> should render as
> consonant + virama + vowel + consonant
> when the vowel in question is one of those that shows up to the left.
> (The ZWNJ of course doesn't appear in print.) But instead, we get
> vowel + consonant + virama + consonant
> which is said to be more or less un-readable.
> I've tried numerous combinations of characters to get this to work, to no
> avail. The one which perhaps came the closest was to use an optional
> hyphen (U+00AD) after the virama. This prevented the vowel from moving
> far left--unfortunately, the Bangla font we're using doesn't have this
> character, so the optional hyphen showed up as a box (indicating a missing
> character in the font). I've also tried include Zero Width Space (U+200B),
> which was simply ignored (perhaps by XeTeX?).
> Suggestions? Is there a way in XeTeX to prevent the vowel sign from
> hopping over a ZWNJ? Or is the problem in the font? That wouldn't be
> surprising, since as I say the virama is usually omitted in text written
> for native speakers, so this problem seldom comes up. We're writing it in
> our grammar for the edification of non-native speakers.
> Mike Maxwell
> Subscriptions, Archive, and List information, etc.:
More information about the XeTeX