[XeTeX] Hyphenated, transliterated Sanskrit.

Mojca Miklavec mojca.miklavec.lists at gmail.com
Mon Nov 22 10:27:08 CET 2010

On Sun, Nov 21, 2010 at 22:34, Yves Codet wrote:
> Le 21 nov. 2010 à 10:22, Manuel B. a écrit :
>> But I don't know how far one can go here. While IAST is meant
>> exclusivly for Sanskrit-transliteration (I know that it's used for
>> Pali also, but in a slightly different way), ISO 15919 contains far
>> more diacritics, than are needed for the transliteration of Sanskrit.
>> It's rather meant as a transliteration of many or most Indian
>> languages. Should it be duplicated then in every hyphenation pattern
>> of every language in question?
>> 2) That might be a stupid question, but aren't hyphennation patterns
>> for most Abugida-scripts more or less the same? That means the
>> hyphennation is rather script dependend, than language dependend. Lots
>> of hyphennation patterns have to be duplicated, if they are ordered by
>> language. While one could have a hyphen-indic.tex instead.
> Arthur and Mojca are better qualified than I to answer those questions. What comes to mind is that such a "total" hyphenation file might rapidly become difficult to maintain, all the more so as it would require several maintainers. Besides, some languages might require special rules, exceptions for instance, which could be unwanted in another language using the same script.
> Arthur and Mojca, what do you think?


Exactly at this point we are discussing whether we should use
one-pattern-per-language or one-pattern-per-script for Ethiopic script
that has been requested recently on the XeTeX mailing list, but for
Ethiopic scripts we have made the first version of patterns by
ourselves, so at least I know exactly what is there (which is not the
case for Indic scripts).

In case of Indic scripts, all I did was fetch the scripts from
OpenOffice and repackaged them for use in TeX.

There might be a reason for language-dependent ordering in OpenOffice
since it applies patterns based on language. Having a single file for
patterns in OOo would mean duplicating that same file ten times, I
guess. In TeX one can reuse the same file for multiple languages more

>From my perspective we are the coordinators & collectors of
hyphenation patterns. We are not specialists for every language that
is being maintained in our repository which means that we still need
someone to create the patterns for the language he/she masters.

If Indic scripts hyphenate in the same way in all the languages that
use the script, then in principle I have nothing against having a
single file that would cover them all, but only if that really brings
some benefit and in that case probably somebody else should do it.
Does anyone require a language that is not present in repository, but
would be covered with a "generic Indic script" hyphenation rules?

If (for example) the author of OpenOffice files would prepare and
maintain the file and thus guarantee compatible behaviour with OOo,
that would be the best option. But first of all the question: what
would be the biggest benefit? New languages?

The rest of thread was talking about Sanskrit.


PS: if any other language specialist could offer some more answers
about Ethiopic scripts, feel free to reply to me and Arthur off-list.

More information about the XeTeX mailing list