[XeTeX] Hyphenation in Transliterated Sanskrit

Dominik Wujastyk wujastyk at gmail.com
Mon Sep 12 15:33:51 CEST 2011

Alessandro and I agree to disagree about the issue of philological
correctness. I think that hyphenating following etymology, lexicon and
morphemic boundaries is *more* philological than "break after a vowel."  I
think what Alessandro means by philology in this case is that he is
influenced by the usage of manuscript scribes and recitation.  But these are
both traditions in which hyphenation was not theorized at all.   However, in
the end, I don't really think it's about philology at all, but mere

The fact is, there's a huge body of printed work out there, books and
journals, that has accumulated since about 1850, in which Sanskrit is
commonly presented in roman transliteration and is routinely hyphenated
according to compound-breaks (dharma-cakra) and morphemic boundaries
(bhav-a-ti).  A lot of people have got used to this kind of hyphenation,
often subliminally, and want it in their own printed work.  Normally, they
don't get it.  Authors of indological journal articles frequently have to
re-hyphenate Sanskrit words manually in their page proofs.  There is a
continuing demand this kind of hyphenation.  (Surely you can hear the
thunder of Sanskritists clamouring for etymological hyphenation hyphenation?
:-)   Since it's not that hard for XeTeX, we can eventually provide it as a
service for those who wish to use it.  I'm not being prescriptive about
this.  Others can use the existing patterns.  Let a thousand flowers bloom.

I'm quite taken by the concept that Alessandro has raised about different
hyphenation traditions for the same language and script in different
countries.  I.e., English (or Sanskrit) might be differently hyphenated in
Italy.  Very interesting.


(coffee later, Alessandro?)

On 12 September 2011 14:55, alessandro graheli <a.graheli at gmail.com> wrote:

>  Thanks to Dominik for presenting my needs for hyphenating romanised
> Sanskrit according to the syllabic division of Sanskrit traditional
> phonetics. For a number of reasons, in my philologically-oriented work I
> prefer to typeset Sanskrit words as faithfully as possible to the sources,
> and the hyph-sa.tex fulfils this need.
> Yet, I think I understand Dominik on the need for a reader-friendly
> hyphenation of Sanskrit, particularly in texts with less strict philological
> needs, and in English essays with occasional Sanskrit terms. In this regard,
> Dominik's suggestion of adopting the customs of the academic tradition makes
> sense. But how consistently are such customs applied? And, how many of them
> are the informed choice of scholars, and not the product of typographers'
> tastes, dictionaries of modern languages, or software-specific algorithms?
> In any case, I think that readibility judgements on hyphenation of Sanskrit
> are largely influenced by one's own habits in hyphenating English, Italian,
> or any other language, so it is difficult to set a universal standard other
> than the Devanagari-conforming one.
> As for Italian typesettingt, hyphenation of Sanskrit words is probably as
> irregularly applied as in English literature. It is just that, in respect to
> English, some consonantic clusters commonly found also in Sanskrit (pr, pl,
> st etc.) are not broken in Italian hyphenation (e.g. ca-sti-tà vs.
> chas-ti-ty); thus, by adopting Italian hyphenating patterns, one probably
> gets slightly better results as far as traditional syllabic division of
> Sanskrit.
> Best,
> Alessandro Graheli
> Il giorno 12/set/11, alle ore 12:58, Dominik Wujastyk ha scritto:
> I've just had a stimulating conversation about this with my friend and
> fellow Sanskritist, Alessandro Graheli (who also reads this XeTeX list, and
> is doing critical editions of Sanskrit texts with XeTeX).
> Alessandro was concerned that I overstated the case.  He has used the
> existing Codet/Kew hyph-sa.tex patterns, and prefers them even for romanised
> Sanskrit.  Word-division after a vowel fits with the forms of recitation and
> caesura that Alessandro learned when he was a student in India working
> extensively with traditional Sanskrit pandits.  He also said that Italian
> typesetting of Sanskrit in romanisation hyphenates this way, rather than in
> the etymological manner that I was asserting.
> We need more study to sort out some of these issues, but it looks prima
> facie as if both styles of hyphenating romanised Sanskrit should be
> preserved, since there are different usage-groups out there.  While the
> hyphenation style for romanised Sanskrit that I describe below reflects
> widespread usage in good printing over the last century or more, mainly in
> British texts and journals, and may be required in future too, there are
> also people who are comfortable with "Devanagari-style" hyphenation in
> Romanised text too.
> Best,
> Dominik
> On 11 September 2011 20:40, Dominik Wujastyk <wujastyk at gmail.com> wrote:
>> Sanskrit is hyphenated differently in Devanagari and in Roman script.  If
>> you use the hyph-sa.tex patterns, you get Roman hyphenated *as if it were
>> Devanagari,* which is not acceptable in scholarly circles.  The last 150
>> years of European writing on Sanskrit, using Romanisation, has developed
>> hyphenation rules based on Sanskrit etymology, paying attention to compound
>> words, internal sandhi, etc. (i.e., like German in some respects).  The
>> Devanagari hyphenation uses a much simpler idea, basically hyphenate after
>> almost any vowel.
>> To get appropriate hyphenation in Romanisation, we need to go down the
>> Patgen path.  So we need to develop a large lexicon of
>> appropriately-hyphenated romanised Sanskrit words in UTF8 encoding, and when
>> that list is reasonably long, process it through Patgen to make patterns.
>> I am slowly developing such a list, but it would be great to collaborate.
>> While the list is in the making, it can still be used, by using
>> \hyphenation.
>> Thus:
>> \documentclass{article}
>> polyglossia, xltxtra, whatnot
>> ...
>> \setotherlanguage{sanskrit}  % for transliterated Sanskrit
>> \newfontfamily\sanskritfont{TeX Gyre Pagella}
>> % Define \sansk{} which is the same as \emph{}, except that it causes
>> appropriate hyphenation
>> % for Sanskrit words.  Use \sansk{} for Sanskrit and \emph{} for English.
>> \newcommand{\sansk}[1]{\emph{\textsanskrit{#1}}}
>> ...
>> \begin{document}
>> \input{sanskrit-hyphenations.tex} % see attached file.
>> Blah English blah.  \sansk{āyurveda, avicchinnasampradāyatvād}.
>> \end{document}
>> Best,
>> Dominik
