[XeTeX] Hyphenated, transliterated Sanskrit.

Manuel B. flammschild at googlemail.com
Sun Nov 21 10:22:55 CET 2010

Im glad to here that there is finaly some implementation of roman
transliteration in the sanskrit hyphenation pattern. Keep up the good

While I was checking hyphen-sa.tex, I wondered two things (which are
irrelevant to Dominik's problem):

1) I saw that that all diacritics used for IAST appear in the pattern,
while some of them (for example ṛ and ṝ) are marked as "non standart
transliteration". That is OK, insofar as IAST is not a standart in the
official sense. But IAST is most commonly used and the "standart"
transliteration of vocalic r in IAST is ṛ, not r̥.

The latter belongs to the international standart transliteration of
Indic scripts, defined as ISO 15919. So if ISO 15919 has to be taken
into concern for the Sanskrit hyphenation pattern, it should be done
so completly. Which means, that for example ṁ should also be added,
and ṃ marked as "non standart transliteration", and so on.

But I don't know how far one can go here. While IAST is meant
exclusivly for Sanskrit-transliteration (I know that it's used for
Pali also, but in a slightly different way), ISO 15919 contains far
more diacritics, than are needed for the transliteration of Sanskrit.
It's rather meant as a transliteration of many or most Indian
languages. Should it be duplicated then in every hyphenation pattern
of every language in question?

2) That might be a stupid question, but aren't hyphennation patterns
for most Abugida-scripts more or less the same? That means the
hyphennation is rather script dependend, than language dependend. Lots
of hyphennation patterns have to be duplicated, if they are ordered by
language. While one could have a hyphen-indic.tex instead.

Have a nice weekend!

2010/11/21 Dominik Wujastyk <wujastyk at gmail.com>:
> That's extremely helpful!  Thank you, Arthur.
> I've upped the first argument of hyphenmins to 2, which helps a lot for
> romanisation, but may make the Nagari breaks more difficult.  I suppose it's
> not reasonable to assume that hyphenation parameters will be the same across
> different scripts.
> Best,
> Dominik
> On 20 November 2010 22:12, Arthur Reutenauer
> <arthur.reutenauer at normalesup.org> wrote:
>> > I'm really not sure what I'm getting as a result. It looks as if it's
>> > roman
>> > script being hyphenated as if it were Devanagari. The initial a- of
>> > several
>> > words, like arhasi, gets separated (a-rhasi), which might just about
>> > look
>> > okay in Nagari, but not in romanisation. Am I actually getting the right
>> > thing
>>  You're indeed getting what the patterns say.  From what I read in
>> hyph-sa.tex, the patterns allow breaks after any vowel (but not inside
>> diphthongs), and forbids them before final consonants or consonant
>> clusters; and that's about it.  It's certainly a debatable choice, but
>> it does seem like the patterns really aim at mimicking the way (say)
>> Sanskrit written using Devanagari is hyphenated.  You would have to take
>> this up with Yves.
>> > Why do I have to pretend that this is Devanagari (\devanagarifont)?
>>  This is by design in polyglossia (see gloss-sanskrit.ldf).  You would
>> have to take this up with François.  (And I'm the one responsible for
>> integrating hyph-sa.tex into hyph-utf8.  Why does it seem like there is
>> a French mafia around Sanskrit support in XeTeX? ;-)
>>        Arthur
