[XeTeX] Hyphenation for Kannada

Zdenek Wagner zdenek.wagner at gmail.com
Fri Mar 9 11:13:58 CET 2012


2012/3/9 Shiva Shankar <shivably04sdst at gmail.com>:
> Hi,
>
> I have a doubt regarding usage of hyphenation rules in LaTeX. Based on
> generic rules or patterns written
> in XeLaTeX for Kannada language I want to write them for Kanlel package.
> Kanlel package is
> not for UTF8 data. It is like Velthuis devanagari package for typesetting
> Sanskrit. My question is after writing patterns for
> Kannada language how can I test them? and where should I need to specify
> lefthyphenminchar and
> righthyphenminchar? Should I need to follow the route of babel or I is there
> anyway that we can test them directly?
>
It is a bit difficult answer. Generally, the hyphenation pattern
should be loaded when the LaTeX format is being generated. when doing
it you have to assign a proper number to the \language register. Later
in your document you need to set the same value to \language in order
to use these patterns. This is also the place for setting
\lefthyphenmin and \righthyphenmin. When loading a font, you have to
define \hyphenchar unless the hyphen is available in the standard
slot. The advantage of babel is that you can define a symbolic name to
the language and the module will set the correct values of
\lefthyphenmin and \righthyphenmin automatically when the language is
selected. Remember that different users may have different languages
installed so that the \language value for a given language may vary
but the symbolic name will be the same.

Now the problem of hyphenation. TeX hyphenates words. The word is
defined as a sequence of characters with \catcode=11 and nonzero
\lccode. At least in Velthuis Devanagari some characters are build
from pieces, conjuncts do not encode virama which means that the
patterns written in UTF-8 would be unusable. Moreover, some matras are
typeset by macros that will create false word boundaries. Simply said,
hyphenation patterns are unusable with Velthuis Devanagari and
situation with Kanlel will most probably be exactly the same.
Hyphenation in Velthuis Devanagari can be optionally generated by the
preprocessor. It is achieved by putting \- to all feasible hyphenation
points.

The last question is whether such a system is still needed now when we
have XeTeX. I understand that people may not have UTF-8 keyboard for
non-latin scripts or may have old files written in some
transliteration and they have to process them. I would recommend
another method. It is possible to generate TECkit map. TeX Live
contains package xetex-devanagari with several such mappings,
ArabXeTeX is another example of such maps. The advantage is that you
specify this mapping when loading the font. The text is then converted
automatically, no preprocessing is needed. I would now go this way.

> --
> Regards
> Shivashankar
> Srirangapatna
>
>
>
> --------------------------------------------------
> Subscriptions, Archive, and List information, etc.:
>  http://tug.org/mailman/listinfo/xetex
>



-- 
Zdeněk Wagner
http://hroch486.icpf.cas.cz/wagner/
http://icebearsoft.euweb.cz



More information about the XeTeX mailing list