[XeTeX] XeTeX hyphenation support for supplementary chars?

Jonathan Kew jonathan_kew at sil.org
Fri May 11 22:53:03 CEST 2007


On 11 May 2007, at 8:26 pm, Kenneth Reid Beesley wrote:

> "XeTeX, the Multilingual Lion:  TeX meets Unicode and smart font
> technologies", Jonathan Kew, TUGboat, Vol. 25 (2005), No. 2.
>
> "Hyphenation support:  Along with other character-code-oriented parts
> of TeX, the hyphenation tables in XeTex have been extended to support
> 16-bit Unicode characters.  This means that it is possible to write
> hyphenation patterns that use any (Plane 0) Unicode letters, including
> non-Latin scripts as well as extended Latin (accented characters,  
> etc.)"
>
> Taken at face value, these statements would seem to indicate that
> one cannot define \lccodes for Deseret Alphabet characters (there is
> an uppercase/lowercase distinction in this alphabet) and that one  
> cannot
> define hyphenation tables over supplementary characters.

You are correct.

> Am I stuck? or am I missing something?

These statements are accurate as of XeTeX 0.996, the latest released  
version, and so you are currently stuck.

However, this has been changed for version 0.997, currently in  
development. While that has not yet been released, the extension to  
full Unicode support is present in the 0.997-dev version that you get  
if you build from the Subversion repository at <http:// 
scripts.sil.org/svn/xetex/TRUNK/>.

So you will be able to do this once 0.997 is released, or if you  
build from source in the meantime. (Actually, I haven't tested  
supplementary-plane hyphenation patterns yet; I'd better do that  
before releasing the new version! Please let me know if you do try  
this.)

I can think of a possible workaround, if you're not ready to compile  
xetex from source: create a font that encodes the Deseret alphabet in  
the Plane 0 Private Use Area, and load this font with a font mapping  
that converts the true Plane 1 values in your data to the PUA codes.  
Then you will be able to define hyphenation patterns in terms of the  
PUA codes you're using, even though your actual text remains  
correctly encoded in Plane 1. It's a hack, but I believe it should  
work. (Untested.)

JK



More information about the XeTeX mailing list