[XeTeX] Hyphenation in Transliterated Sanskrit

alessandro graheli a.graheli at gmail.com
Mon Sep 12 14:55:57 CEST 2011


Thanks to Dominik for presenting my needs for hyphenating romanised  
Sanskrit according to the syllabic division of Sanskrit traditional  
phonetics. For a number of reasons, in my philologically-oriented  
work I prefer to typeset Sanskrit words as faithfully as possible to  
the sources, and the hyph-sa.tex fulfils this need.

Yet, I think I understand Dominik on the need for a reader-friendly  
hyphenation of Sanskrit, particularly in texts with less strict  
philological needs, and in English essays with occasional Sanskrit  
terms. In this regard, Dominik's suggestion of adopting the customs  
of the academic tradition makes sense. But how consistently are such  
customs applied? And, how many of them are the informed choice of  
scholars, and not the product of typographers' tastes, dictionaries  
of modern languages, or software-specific algorithms? In any case, I  
think that readibility judgements on hyphenation of Sanskrit are  
largely influenced by one's own habits in hyphenating English,  
Italian, or any other language, so it is difficult to set a universal  
standard other than the Devanagari-conforming one.

As for Italian typesettingt, hyphenation of Sanskrit words is  
probably as irregularly applied as in English literature. It is just  
that, in respect to English, some consonantic clusters commonly found  
also in Sanskrit (pr, pl, st etc.) are not broken in Italian  
hyphenation (e.g. ca-sti-tà vs. chas-ti-ty); thus, by adopting  
Italian hyphenating patterns, one probably gets slightly better  
results as far as traditional syllabic division of Sanskrit.

Best,
Alessandro Graheli



Il giorno 12/set/11, alle ore 12:58, Dominik Wujastyk ha scritto:

I've just had a stimulating conversation about this with my friend  
and fellow Sanskritist, Alessandro Graheli (who also reads this XeTeX  
list, and is doing critical editions of Sanskrit texts with XeTeX).

Alessandro was concerned that I overstated the case.  He has used the  
existing Codet/Kew hyph-sa.tex patterns, and prefers them even for  
romanised Sanskrit.  Word-division after a vowel fits with the forms  
of recitation and caesura that Alessandro learned when he was a  
student in India working extensively with traditional Sanskrit  
pandits.  He also said that Italian typesetting of Sanskrit in  
romanisation hyphenates this way, rather than in the etymological  
manner that I was asserting.

We need more study to sort out some of these issues, but it looks  
prima facie as if both styles of hyphenating romanised Sanskrit  
should be preserved, since there are different usage-groups out  
there.  While the hyphenation style for romanised Sanskrit that I  
describe below reflects widespread usage in good printing over the  
last century or more, mainly in British texts and journals, and may  
be required in future too, there are also people who are comfortable  
with "Devanagari-style" hyphenation in Romanised text too.

Best,
Dominik

On 11 September 2011 20:40, Dominik Wujastyk <wujastyk at gmail.com> wrote:
Sanskrit is hyphenated differently in Devanagari and in Roman  
script.  If you use the hyph-sa.tex patterns, you get Roman  
hyphenated as if it were Devanagari, which is not acceptable in  
scholarly circles.  The last 150 years of European writing on  
Sanskrit, using Romanisation, has developed hyphenation rules based  
on Sanskrit etymology, paying attention to compound words, internal  
sandhi, etc. (i.e., like German in some respects).  The Devanagari  
hyphenation uses a much simpler idea, basically hyphenate after  
almost any vowel.

To get appropriate hyphenation in Romanisation, we need to go down  
the Patgen path.  So we need to develop a large lexicon of  
appropriately-hyphenated romanised Sanskrit words in UTF8 encoding,  
and when that list is reasonably long, process it through Patgen to  
make patterns.

I am slowly developing such a list, but it would be great to  
collaborate.

While the list is in the making, it can still be used, by using  
\hyphenation.

Thus:

\documentclass{article}

polyglossia, xltxtra, whatnot
...
\setotherlanguage{sanskrit}  % for transliterated Sanskrit
\newfontfamily\sanskritfont{TeX Gyre Pagella}

% Define \sansk{} which is the same as \emph{}, except that it causes  
appropriate hyphenation
% for Sanskrit words.  Use \sansk{} for Sanskrit and \emph{} for  
English.
\newcommand{\sansk}[1]{\emph{\textsanskrit{#1}}}
...
\begin{document}

\input{sanskrit-hyphenations.tex} % see attached file.

Blah English blah.  \sansk{āyurveda, avicchinnasampradāyatvād}.

\end{document}


Best,
Dominik



--------------------------------------------------
Subscriptions, Archive, and List information, etc.:
   http://tug.org/mailman/listinfo/xetex

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://tug.org/pipermail/xetex/attachments/20110912/80bc6569/attachment.html>


More information about the XeTeX mailing list