[XeTeX] Hyphenation in Transliterated Sanskrit

Sun Sep 11 20:40:48 CEST 2011

Sanskrit is hyphenated differently in Devanagari and in Roman script.  If
you use the hyph-sa.tex patterns, you get Roman hyphenated *as if it were
Devanagari,* which is not acceptable in scholarly circles.  The last 150
years of European writing on Sanskrit, using Romanisation, has developed
hyphenation rules based on Sanskrit etymology, paying attention to compound
words, internal sandhi, etc. (i.e., like German in some respects).  The
Devanagari hyphenation uses a much simpler idea, basically hyphenate after
almost any vowel.

To get appropriate hyphenation in Romanisation, we need to go down the
Patgen path.  So we need to develop a large lexicon of
appropriately-hyphenated romanised Sanskrit words in UTF8 encoding, and when
that list is reasonably long, process it through Patgen to make patterns.

I am slowly developing such a list, but it would be great to collaborate.

While the list is in the making, it can still be used, by using
\hyphenation.

Thus:

\documentclass{article}

polyglossia, xltxtra, whatnot
...
\setotherlanguage{sanskrit}  % for transliterated Sanskrit
\newfontfamily\sanskritfont{TeX Gyre Pagella}

% Define \sansk{} which is the same as \emph{}, except that it causes
appropriate hyphenation
% for Sanskrit words.  Use \sansk{} for Sanskrit and \emph{} for English.
\newcommand{\sansk}[1]{\emph{\textsanskrit{#1}}}
...
\begin{document}

\input{sanskrit-hyphenations.tex} % see attached file.

Blah English blah.  \sansk{āyurveda, avicchinnasampradāyatvād}.

\end{document}

Best,
Dominik
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://tug.org/pipermail/xetex/attachments/20110911/7dd76076/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: sanskrit-hyphenations.tex
Type: application/x-tex
Size: 2973 bytes
Desc: not available
URL: <http://tug.org/pipermail/xetex/attachments/20110911/7dd76076/attachment.tex>