[XeTeX] Hyphenation patterns and Unicode

Jonathan Kew jonathan_kew at sil.org
Wed Oct 19 13:32:19 CEST 2005


On 19 Oct 2005, at 11:48 am, Nicola Vitacolonna wrote:

> Hi everybody,
> the XeTeX FAQ says that hyphenation patterns should be "true  
> Unicode" files. It is not clear to me if the following (excerpt of  
> a) file (for Lithuanian) is ok:
>
> \def\ltletters{
> \catcode"81=11\lccode"81="A1\uccode"81="81%A nosine
> \catcode"83=11\lccode"83="A3\uccode"83="83%C su pauksteliu
> \catcode"84=11\lccode"84="A4\uccode"84="84%E su tasku
> % etc...
> }
> \ltletters
> \patterns{
> .ap1
> .api1
> .a^^b23v
> %etc...
> }
>

This does not appear to be Unicode-compliant, as it is expecting  
character codes such as (hex) 81, 83, and 84 to be accented letters.  
(As it doesn't have these literal codes in the file, but uses ^^..  
sequences, XeTeX will be able to read it; but the resulting patterns  
won't be correct for Unicode text.)

I assume this file was created to work with one of the 8-bit  
encodings used with TeX, such as T1, and this does not match Unicode  
encoding for the accented letters.

> I would like to add this file to language.dat, rebuild all format  
> files, and use LaTeX or XeLaTeX with babel. Is this expected to  
> work? Or should I use the above file only for LaTeX with babel, and  
> go for a different solution when I want to use XeLaTeX?

It would be possible to patch this file for XeTeX/Unicode in a  
similar way to others that I've looked at: test if it is being loaded  
by XeTeX, and if so, make the characters active and define them to  
expand to their Unicode equivalents. That way, the actual pattern  
lines can be left untouched, and the file still works as before when  
used with a standard TeX.

I don't see this file among the standard collection, but if you need  
assistance in adapting it for XeTeX, feel free to send me a copy.

JK



More information about the XeTeX mailing list