[XeTeX] Hyphenated, transliterated Sanskrit.

Sun Nov 21 22:34:55 CET 2010

Hello.

Le 21 nov. 2010 à 10:22, Manuel B. a écrit :

> While I was checking hyphen-sa.tex, I wondered two things (which are
> irrelevant to Dominik's problem):
> 
> 1) I saw that that all diacritics used for IAST appear in the pattern,
> while some of them (for example ṛ and ṝ) are marked as "non standart
> transliteration". That is OK, insofar as IAST is not a standart in the
> official sense. But IAST is most commonly used and the "standart"
> transliteration of vocalic r in IAST is ṛ, not r̥.
> 
> The latter belongs to the international standart transliteration of
> Indic scripts, defined as ISO 15919. So if ISO 15919 has to be taken
> into concern for the Sanskrit hyphenation pattern, it should be done
> so completly. Which means, that for example ṁ should also be added,
> and ṃ marked as "non standart transliteration", and so on.

I agree with you on both points.

The comments you mention were merely notes to myself (what we call in French a "pense-bête" :), but since they can be read by other people they should be clearer, and I'll use IAST or ISO 15919 instead of "non standard" and (implicitly) "standard".

I'll also add the missing characters, ṁ, ẖ, ḫ and the sign for anudātta (I think that's all, as far as Sanskrit is concerned).

> But I don't know how far one can go here. While IAST is meant
> exclusivly for Sanskrit-transliteration (I know that it's used for
> Pali also, but in a slightly different way), ISO 15919 contains far
> more diacritics, than are needed for the transliteration of Sanskrit.
> It's rather meant as a transliteration of many or most Indian
> languages. Should it be duplicated then in every hyphenation pattern
> of every language in question?
> 
> 2) That might be a stupid question, but aren't hyphennation patterns
> for most Abugida-scripts more or less the same? That means the
> hyphennation is rather script dependend, than language dependend. Lots
> of hyphennation patterns have to be duplicated, if they are ordered by
> language. While one could have a hyphen-indic.tex instead.

Arthur and Mojca are better qualified than I to answer those questions. What comes to mind is that such a "total" hyphenation file might rapidly become difficult to maintain, all the more so as it would require several maintainers. Besides, some languages might require special rules, exceptions for instance, which could be unwanted in another language using the same script.

Arthur and Mojca, what do you think?

Regards,

Yves