[XeTeX] Polyglossia and Sanskrit
arthur.reutenauer at normalesup.org
Sat Oct 11 20:10:15 CEST 2008
> I think the same file could be used for all scripts deriving from
> Brahmi, even Grantha.
There seem to be two different issues, here:
* Can we use the same file for different scripts?
* Can we use the same abstract patterns for different scripts?
From what I understand, you're discussing the second one, but that's
of no concern to XeTeX for the moment: the work-breaking algorithm
doesn't care that the Devanagari virama behaves the same way as the
Gujarati virama as far as hyphenation is concerned, because they are two
different Unicode characters, and XeTeX sees no connection between them
(other than the difference between their Unicode code points being a
multiple of 128 ;-)
As far as the first question is concerned, though, the answer is very
simple: we can use the same file for any number of different scripts,
since the set of Unicode characters entailed by each of them is
distinct, and the different patterns simply don't interact across
scripts. Likewise, we could use the same pattern file for, say,
English, Russian, Armenian, Georgian and Ancient Greek, since they all
use non-overlapping sets of characters in their spelling; but that would
be rather pointless. On the other hand, using the same file for
different scripts of Sanskrit is very meaningful, and I think it's
rightly done so.
> If we prohibit a break between two consonants
> which are graphically realised as a conjunct, can we allow it between
> two consonants to which no conjunct corresponds? The same issue
> arises in Devanagari as well.
That is a different question altogether, altough certainly a
legitimate one; and whatever you decide, you don't need to stick to a
single behaviour for all the scripts, since the patterns are independent
from one script to another.
> Kharoshthi will perhaps need separate hyphenation patterns, but we
> only have a proposal of encoding at the moment and we must wait and
> see how it will be encoded in the end.
Kharoshthi has been in Unicode for years (version 4.1, March 2005).
It's encoded at positions U+10A00 to U+10A05F; see the code chart at
http://www.unicode.org/charts/PDF/U10A00.pdf and the description in
§ 10.6 of the book (http://www.unicode.org/versions/Unicode5.0.0/ch10.pdf,
page 26). Do you mean something else than the Kharoshthi script that is
And again, if you want to add patterns for Sanskrit written in
Kharoshthi, you can do that in the same file without any problem.
>> Moreover, for grantha, there is not yet an unicode scheme for this
>> script and the only fonts I found uses the unicode Bengali scheme!
> As far as I know Grantha has not even been submitted to the Unicode
Indeed; I couldn't even find a serious discussion of the script on the
Unicode mailing-lists. Note that If you're interested in having it
encoded, you are welcome to submit a proposal, or even an informal
discussion of it. As for any ancient script, it is not going to be
included in Unicode unless experts seize the issue. One of the people
to talk to is certainly N. Ganesan (naa.ganesan at gmail.com), who is very
active in the work of the Consortium about Indic scripts, and can convey
your opinions to the relevant people.
More information about the XeTeX