[XeTeX] Hyphenated, transliterated Sanskrit.

Santhosh Thottingal santhosh.thottingal at gmail.com
Mon Nov 22 17:11:03 CET 2010

On Mon, Nov 22, 2010 at 8:05 PM, Arthur Reutenauer
<arthur.reutenauer at normalesup.org> wrote:
>> If Indic scripts hyphenate in the same way in all the languages that
>> use the script
>  I've seen no evidence to let me think that they do, but I'm happy
> about any input.  Santhosh, since you obviously used Yves' hyphenation
> patterns for Sanskrit as a basis for your files, can you tell us a bit
> more about that?  I'm curious in particular about the rule "do not break
> before a final consonant", which you stripped.

Hi all,
As far as I know, for Indian languages, it is true that languages
using the same script have same hyphenation patterns. So there should
not be a difference between Sanskrit and Hindi(Devanagari script) or
Assamese and Bengali(Bengali script).

And for Indian scripts, the basic rules are almost same,  but not all.
Tamil got major differences from Malayalam for example.

"do not break before a final consonant or cluster" is not valid as far
as I know. At least for my mother tongue, Malayalam, I am sure that
this rule is not there. For other languages I relied on the inputs
from my friends, but did not come through this rule so far. But  even
then, this rule often get applied when applications set "minimum
characters after break" setting that many applications provide.

There is one thing to be noted while discussing about having a single
pattern file for all Indic scripts. The patterns are used by many
applications other than tex, and it is reasonable for them to rely on
the system locale or detected script or user supplied language code
for finding out which hyphenation rules are to be used. So It is a
reasonable use case that one user search for hyphen-ml_IN package in a
distro if he want to use Malayalam hyphenation in openoffice. In most
popular GNU/Linux distros, there is a  metapackage for language
support. For eg: language-support-ml installs everything required for
Malayalam. For the maintainers of this package, it is easy to link
them to particular language hyphenation package.   So I don't see much
benefit in merging all of them.

I think we can compare this with Indic fonts packaging Maintaining
happening in linux distros. Debian used to have a ttf-indic-fonts
package. Now we have that as a metapackage with dependencies to
ttf-malayalam-fonts, ttf-tamil-fonts, ttf-hindi-fonts etc and it makes
the maintainers, and bug reporters task easy.

ps: The git repo  I maintain for Indic hyphenation
patterns(http://git.savannah.gnu.org/cgit/smc/hyphenation.git) -
upstream repo for fedora, openoffice etc.

Santhosh Thottingal

More information about the XeTeX mailing list