[XeTeX] Bangla font question

maxwell maxwell at umiacs.umd.edu
Tue Mar 15 22:16:39 CET 2011

In the msgs below, I was asking about a problem I was having with a
Bengali font, and David suggested the problem might be a font problem.
I've done some more testing, and it appears that is *not* the problem.

I tried three different fonts in XeLaTeX: Vrinda, Lohit Bengali, and
Bangla. (Vrinda is from from MsWindows, while the latter two are downloaded
from somewhere.) I also tried the UniBangla font, but when I include
[Script=Bengali] in the call to \newfontfamily with that font, I get an
error 11 in the "driver return code", and the resulting PDF is not
readable.  Without that parameter, Bengali words don't appear correctly
(vowels don't appear in the correct place, etc.).

With the Lohit and Bangla fonts in XeLaTeX, I get the same result: the
vowel sign E, which should hop over just the first consonant to its left,
instead hops over both (and over the ZWNJ between the two consonants).
With the Vrinda font, not only does the E hop over two consonants, the ZWNJ
shows up as a space character (not a zero width character).  (The Vrinda
font is probably not licensed for general use on a Linux system, which is
where I created these files, but this was just for test purposes.)

All three fonts work fine in MsWord: the vowel sign E correctly hops over
just the first consonant to its left.

I'm attaching three files: BanglaBug.pdf is the result of running xelatex
on the file BanglaBug.xetex (a minimal file), while BanglaBugWord.pdf is a
PDF of the MsWord (2007) version.  (I created the Word file by copying the
word from the .xetex file into Word.)  The vowel sign E is the thing that
looks like a left-hand paren hanging down from the horizontal line.  It's
the second obvious character in the PDF created by XeTeX, but the third
character in the PDF output by Word.

I have to admit that this was done with the TeXLive 2009 version of xetex.
I need to get the 2010 version installed...

Other thoughts?

Mike Maxwell

On Thu, 10 Mar 2011 19:23:51 -0500, "David J. Perry"
<hospes.primus at verizon.net> wrote:
> Sounds like a font problem.  I don't know Bangla, but any properly
> designed font should render correctly any combinations of characters
> that ordinarily appear in the language(s) it is designed to support.
> Implementing such complex substitutions as you describe is difficult
> for font developers, so we can sympathize a bit, but nonetheless
> they should work. I suppose there is a small chance that the ICU
> renderer does not do something correctly when dealing with Bangla,
> but it's more likely the font.  Does the font work
> correctly outside of XeTeX?
>
> ----- Original Message -----
> From: "maxwell" <maxwell at umiacs.umd.edu>
> To: "Unicode-based TeX for Mac OS X and other platforms" <xetex at tug.org>
> Sent: Thursday, March 10, 2011 6:59 PM
> Subject: [XeTeX] Bangla font question
>
>
>> We're publishing a grammar of Bangla, which uses the Bengali script
block
>> of Unicode.  We're running into a problem with the appearance of
certain
>> vowel characters, which are supposed to appear to the *left* of the
>> consonant that they're pronounced after.  These include U+09BF, U+09C7
>> and
>> U+09C8.  (U+09CB and U+09CC are similar.)  (Those of you who studied
>> transformational grammar may be reminded of the "affix hop
>> transformation.")
>>
>> Normally this works just fine.  The display rules are somewhat complex,
>> because the Bangla writing system is one of those that has a default
>> vowel.
>> Specifically, a consonant letter which is not followed by an overt
vowel
>> sign in the writing is assumed to be followed by the default vowel in
>> speech.  If a consonant is *not* followed by a vowel in speech, i.e. if
>> it
>> is followed by another consonant (i.e. it's the first consonant in a
>> consonant cluster), then you're supposed to put a special virama (or
>> hashanta) mark under the consonant--a diacritic to indicate that
there's
>> no
>> vowel following.
>>
>> When a consonant + virama appears at the end of the word, the virama
>> would
>> appear overtly.  In the rendering of Unicode text, a consonant + virama
+
>> consonant is often replaced on-screen or in print by a conjunct
>> consonant,
>> which is a kind of double consonant (analogous to English x = ks, but
>> often
>> composed of pieces of the two consonant characters in Bangla).  Not all
>> fonts have all conjunct consonants, and when a font lacks a particular
>> conjunct, the expected representation on-screen or in print is
generally
>> the underlying representation, i.e. consonant + virama + consonant.
>>
>> There is one exception to the contraction of consonant + virama +
>> consonant into conjunct consonant, and that is when there's a morpheme
>> boundary between the two consonants (i.e. the first consonant is in the
>> stem, and the second consonant is in a suffix). In this case, the
>> expected
>> appearance on-screen or in print would be consonant + virama +
consonant,
>> i.e. what you'd get if the font didn't have a conjunct consonant. In
>> order
>> to force this behavior, Unicode uses a ZWNJ (Zero Width Non-Joiner);
the
>> underlying sequence
>>   consonant + virama + ZWNJ + consonant
>> is output as
>>   consonant + virama + consonant
>> rather than as a two-consonant conjunct.
>>
>> If one of these vowels that hops leftward (U+09BF, U+09C7 and U+09C8)
is
>> preceded by a conjunct consonant (underlyingly a sequence of consonant
+
>> virama + consonant), then the vowel hops leftward over the conjunct.
>>
>> So far, so good.
>>
>> However, a problem arises when consonant clusters occur across morpheme
>> boundaries *and* the second consonant is followed by one of the vowel
>> signs
>> that is supposed to appear to the left of the consonant it's pronounced
>> after.  In this case, we're told that the vowel sign should appear
>> *between* the two consonants, rather than to the left of both
consonants.
>> In other words, the underlying sequence
>>   consonant + virama + ZWNJ + consonant + vowel
>> should render as
>>   consonant + virama + vowel + consonant
>> when the vowel in question is one of those that shows up to the left.
>> (The ZWNJ of course doesn't appear in print.) But instead, we get
>>   vowel + consonant + virama + consonant
>> which is said to be more or less un-readable.
>>
>> I've tried numerous combinations of characters to get this to work, to
no
>> avail.  The one which perhaps came the closest was to use an optional
>> hyphen (U+00AD) after the virama.  This prevented the vowel from moving

>> too
>> far left--unfortunately, the Bangla font we're using doesn't have this
>> character, so the optional hyphen showed up as a box (indicating a
>> missing
>> character in the font). I've also tried include Zero Width Space
>> (U+200B),
>> which was simply ignored (perhaps by XeTeX?).
>>
>> Suggestions? Is there a way in XeTeX to prevent the vowel sign from
>> hopping over a ZWNJ?  Or is the problem in the font?  That wouldn't be
>> surprising, since as I say the virama is usually omitted in text
written
>> for native speakers, so this problem seldom comes up.  We're writing it
>> in
>> our grammar for the edification of non-native speakers.
>>
>>   Mike Maxwell
>>
>>
>> --------------------------------------------------
>> Subscriptions, Archive, and List information, etc.:
>>  http://tug.org/mailman/listinfo/xetex
>
>
>
> --------------------------------------------------
> Subscriptions, Archive, and List information, etc.:
>   http://tug.org/mailman/listinfo/xetex
-------------- next part --------------
A non-text attachment was scrubbed...
Name: BanglaBug.pdf
Type: application/pdf
Size: 20719 bytes
Desc: not available
URL: <http://tug.org/pipermail/xetex/attachments/20110315/bb4885c3/attachment-0002.pdf>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: BanglaBugWord.pdf
Type: application/pdf
Size: 23572 bytes
Desc: not available
URL: <http://tug.org/pipermail/xetex/attachments/20110315/bb4885c3/attachment-0003.pdf>
-------------- next part --------------
\documentclass[letterpaper]{report}
\usepackage{xltxtra}
\usepackage{fontspec}

%Bengali fonts:
\newfontfamily\bengalifontA[Script=Bengali]{Bangla}
%This font has a problem with the placement of vowel glyphs that hop over
% a consonant to their left, when there are two consonants separated by a
% ZWNJ (to prevent conjunct formation, needed at morpheme boundaries).
\newfontface\bengalifontB[Script=Bengali]{Lohit Bengali}
%Like the Bangla font, here the vowel sign E incorrectly hops left
% over two consonants separated by a ZWNJ.
%\newfontfamily\bengalifontC[Script=Bengali]{UniBangla}
%With the above line, we get an error
%   Error 11 (driver return code) generating output;
%   file BanglaBug.pdf may not be valid.
\newfontfamily\bengalifontC{UniBangla}
%With the above line, we don't get the error 11, but
% vowel letters that should hop left don't.
\newfontfamily\bengalifontD[Script=Bengali]{Vrinda}
%Looks correct in MsWord, but in XeTeX's output the vowel sign E incorrectly
% hops left over two consonants, and the ZWNJ shows up as some kind of
% space character.

\begin{document}
Using \XeTeX{}

{\bengalifontA শুন্‌বেন্} Bangla font

{\bengalifontB শুন্‌বেন্} Lohit Bengali font

{\bengalifontC শুন্‌বেন্} UniBangla

{\bengalifontD শুন্‌বেন্} Vrinda
\end{document}