[XeTeX] Sanskrit hyphenation

Jonathan Kew jonathan_kew at sil.org
Tue Mar 29 22:36:06 CEST 2005


On 29 Mar 2005, at 5:14 pm, Yves Codet wrote:

> Le 29 mars 05, à 09:32, Jonathan Kew a écrit :
>
>> This was just done to work around a deficiency in the 
>> unicode-letters.tex file that XeTeX uses in building the formats. I 
>> think I may have corrected that, so that Indic letters will be 
>> accepted as "letters" for hyphenation without this extra code; can't 
>> remember if I ever checked this.
>>
>> Anyway, if there is still a problem, the correct solution is to fix 
>> unicode-letters.tex, and omit this stuff from hyphenation files.
>
> There seems to be a problem. If I delete the loop I get this:
>
> xeinitex -jobname=splain \*plain \\input sanhyph.tex \\dump
> This is XeTeX, Version 3.141592-2.2-0.93 (Web2C 7.5.3) (INITEX)
> entering extended mode
> (/usr/local/teTeX/share/texmf.tetex/tex/plain/base/plain.tex
> Preloading the plain format: codes, registers, parameters, fonts, more 
> fonts,
> macros, math definitions, output routines, hyphenation
> (/usr/local/teTeX/share/texmf.tetex/tex/generic/hyphen/hyphen.tex))
> (./sanhyph.tex
> ! Nonletter.
> l.18 1अ
>          1
> ?

This is because your command doesn't load unicode-letters.tex, so all 
characters that plain TeX doesn't know about have catcode "other". If 
you look in xetex.ini (the input file used to create the "plain" 
xetex.fmt file), you'll see that it does:

	\input plain
	\input unicode-letters
	\dump

So, similarly, you'll do better if you say:

	xeinitex -jobname=splain \*plain \\input unicode-letters \\input 
sanhyph.tex \\dump

Actually, if I were you, I'd make an "splain.ini" file, on the pattern 
of "xetex.ini", containing the lines:

	\input plain
	\input unicode-letters
	\input sanhyph
	\dump

Then to create the .fmt file, you should be able to say:

	xeinitex \*splain.ini

Unfortunately, if you try this, you'll find that it still fails (a bit 
further along); it turns out that unicode-letters.tex gives "letter" 
catcodes to the characters with Unicode "General Category" of Letter, 
but not to those classified as combining marks. This includes 
Devanagari vowel matras, nukta, etc. So patterns containing those will 
generate errors.

I'm inclined to update the script that creates unicode-letters.tex so 
that it classifies combining marks as letters; this won't always be 
right, but on the whole, I think it's more useful that way. So expect 
that to change in the next release; meanwhile, you do still need to set 
the codes for at least those characters before you can load the 
patterns successfully.

> Before line 18 there were rules about ZWJ and ZWNJ (which I commented 
> out here), and they were also considered as non letters.
>
> Another difficulty; for La TeX format I used to write this:
>
> xeinitex -jobname=slatex \\input latex.ini sanhyph.tex \\dump
>
> but with this command "sanhyph.tex" is now ignored. Has the syntax 
> changed?

I'm not sure; maybe latex.ini has changed? I don't see how this could 
work with the current latex.ini, which does \input latex.ltx; latex.ltx 
in turn ends with \dump, which will terminate the job. So the rest of 
the command line will never be used.

The correct way to get additional hyphenation patterns into latex is 
via language.dat, I believe.

JK



More information about the XeTeX mailing list