[XeTeX] Hyphenation in Transliterated Sanskrit

Mon Sep 12 12:58:46 CEST 2011

I've just had a stimulating conversation about this with my friend and
fellow Sanskritist, Alessandro Graheli (who also reads this XeTeX list, and
is doing critical editions of Sanskrit texts with XeTeX).

Alessandro was concerned that I overstated the case.  He has used the
existing Codet/Kew hyph-sa.tex patterns, and prefers them even for romanised
Sanskrit.  Word-division after a vowel fits with the forms of recitation and
caesura that Alessandro learned when he was a student in India working
extensively with traditional Sanskrit pandits.  He also said that Italian
typesetting of Sanskrit in romanisation hyphenates this way, rather than in
the etymological manner that I was asserting.

We need more study to sort out some of these issues, but it looks prima
facie as if both styles of hyphenating romanised Sanskrit should be
preserved, since there are different usage-groups out there.  While the
hyphenation style for romanised Sanskrit that I describe below reflects
widespread usage in good printing over the last century or more, mainly in
British texts and journals, and may be required in future too, there are
also people who are comfortable with "Devanagari-style" hyphenation in
Romanised text too.

Best,
Dominik

On 11 September 2011 20:40, Dominik Wujastyk <wujastyk at gmail.com> wrote:

> Sanskrit is hyphenated differently in Devanagari and in Roman script.  If
> you use the hyph-sa.tex patterns, you get Roman hyphenated *as if it were
> Devanagari,* which is not acceptable in scholarly circles.  The last 150
> years of European writing on Sanskrit, using Romanisation, has developed
> hyphenation rules based on Sanskrit etymology, paying attention to compound
> words, internal sandhi, etc. (i.e., like German in some respects).  The
> Devanagari hyphenation uses a much simpler idea, basically hyphenate after
> almost any vowel.
>
> To get appropriate hyphenation in Romanisation, we need to go down the
> Patgen path.  So we need to develop a large lexicon of
> appropriately-hyphenated romanised Sanskrit words in UTF8 encoding, and when
> that list is reasonably long, process it through Patgen to make patterns.
>
> I am slowly developing such a list, but it would be great to collaborate.
>
> While the list is in the making, it can still be used, by using
> \hyphenation.
>
> Thus:
>
> \documentclass{article}
>
> polyglossia, xltxtra, whatnot
> ...
> \setotherlanguage{sanskrit}  % for transliterated Sanskrit
> \newfontfamily\sanskritfont{TeX Gyre Pagella}
>
> % Define \sansk{} which is the same as \emph{}, except that it causes
> appropriate hyphenation
> % for Sanskrit words.  Use \sansk{} for Sanskrit and \emph{} for English.
> \newcommand{\sansk}[1]{\emph{\textsanskrit{#1}}}
> ...
> \begin{document}
>
> \input{sanskrit-hyphenations.tex} % see attached file.
>
> Blah English blah.  \sansk{āyurveda, avicchinnasampradāyatvād}.
>
> \end{document}
>
>
> Best,
> Dominik
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://tug.org/pipermail/xetex/attachments/20110912/b7490549/attachment.html>