[XeTeX] Sanskrit hyphenation

Yves Codet ycodet at club-internet.fr
Tue Mar 29 09:23:32 CEST 2005


Hello.

Le 28 mars 05, à 16:12, somadevah at aol.com a écrit :

> One question. The sanhyph.tex file defines patterns for Devanagari 
> script only. Would it not make sense to add other scripts too (using 
> the same patterns)? In Southern India Sanskrit is often written with 
> local scripts (as it is sometimes in Bengal etc)...

It's a good idea, but I don't know all of those scripts very well and 
the patterns would have to be checked. Besides I wonder how the initial 
loop:

\newcount\n \n="0901
\loop \lccode\n=\n \ifnum\n<"0963 \advance\n by 1 \repeat

can be modified so as to make it go through 0981--09CD (Bengali), 
0B82--0BCD (Tamil)... 0D02--0D4D (Kannada). Or could it simply be:

\newcount\n \n="0901
\loop \lccode\n=\n \ifnum\n<"0D4D \advance\n by 1 \repeat

>  and then should it not include roman transliteration too? However, I 
> must note that if the exactly same patterns are used for the diacritic 
> Roman transliteration the result is readable but a bit strange (you 
> get lines beginning with consonant clusters).

I think it's strange because we compare with hyphenation habits in 
English, German, French... But if we bear in mind that:

tya-
ktvā

is a mere transposition of:

त्य-
क्त्वा

in Latin script, it doesn't seem so strange. If we don't want such 
hyphenations (personally I'm not shocked by them), I guess there should 
be etymological patterns and it may take a fairly long time to define 
them. Also, they should be best described in another file, I suppose, 
otherwise the above loop could be:

\newcount\n \n="0061
\loop \lccode\n=\n \ifnum\n<"0D4D \advance\n by 1 \repeat

and we might hit some memory limit.

Kind regards,

Yves



More information about the XeTeX mailing list