[XeTeX] Hyphenation of supplementary characters in Xe(La)TeX ?

Kenneth Reid Beesley krbeesley at gmail.com
Wed Jan 21 06:52:34 CET 2009


I'm using Xe(La)Tex to typeset a book in supplementary characters  
(Deseret Alphabet),
and I need to define some hyphenation (to begin with, just a few dozen  
cases in \hyphenation{}).

I understand that a character is not considered a letter, and is  
therefore not visible to
hyphenation, unless it has a non-zero lccode.   In the old 16-bit- 
limited days, you could
specify things like

\lccode"2019="0027"

to make the official Unicode apostrophe (U+2019) act like the ASCII  
apostrophe as far as
hyphenation is concerned.  Or just

\lccode"2019="2019

to make sure that it has a non-zero lccode.  The problem was that even  
XeTeX 0.996 was
limited to 16-bit BMP characters.

With 0.997, XeTeX was supposed to handle hyphenation for supplementary  
characters.
(I've now got 0.999)

Question:  What is the syntax for specifying a supplementary-character  
code point value in
\lccode?

Thanks,

Ken

Begin forwarded message:

> From: xetex-request at tug.org
> Date: 12 May 2007 04:00:01 MDT
> To: xetex at tug.org
> Subject: XeTeX Digest, Vol 38, Issue 12
> Reply-To: xetex at tug.org
>
> Send XeTeX mailing list submissions to
> 	xetex at tug.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> 	http://tug.org/mailman/listinfo/xetex
> or, via email, send a message with subject or body 'help' to
> 	xetex-request at tug.org
>
> You can reach the person managing the list at
> 	xetex-owner at tug.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of XeTeX digest..."
>
>
> Today's Topics:
>
>   1. XeTeX hyphenation support for supplementary chars?
>      (Kenneth Reid Beesley)
>   2. Re: XeTeX hyphenation support for supplementary chars?
>      (Jonathan Kew)
>   3. Re: [OS X TeX] fontspec, disabling something. (Bruno Voisin)
>   4. Does it make any sense to specify layout engine in font	name?
>      (Jjgod Jiang)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Fri, 11 May 2007 13:26:40 -0600
> From: "Kenneth Reid Beesley" <krbeesley at gmail.com>
> Subject: [XeTeX] XeTeX hyphenation support for supplementary chars?
> To: xetex at tug.org
> Message-ID:
> 	<a9fb8f90705111226w1d557382p9eaf334d0f0c63d3 at mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
>    XeTeX hyphenation support for supplementary chars?
>
> I'm typsetting large documents in supplementary characters, in the
> Deseret Alphabet (U+10400 to U+1044F) to be precise, and my only  
> remaining
> problem is hyphenation.
>
> JK previously suggested that I'd need to define \lccodes for the  
> Deseret
> Alphabet characters, so that patterns could be defined, or so that  
> \hyphenate
> could be effective.  It took me (not a TeX guru) a while to  
> understand that
> \lccode is used both for lowercasing and for hyphenation control, but
> then I found
> the following:
>
> "About XeTeX", Jonathan Kew, 17 Oct 2005
>
> "Because XeTeX works with UTF-16 code units, TEX commands that deal
> with character codes, such as \char, \catcode, \lccode, etc., have
> been extended to handle 16-bit values (up to 65535, or "FFFF). Note
> that it is not possible to assign individual character properties such
> as \catcode to non-Plane 0 Unicode characters, because these are
> treated as a pair of surrogate codes; however, there is probably
> little reason to need to do this. Supplementary-plane characters can
> still be treated as normal text to be typeset. "
>
>
> "XeTeX, the Multilingual Lion:  TeX meets Unicode and smart font
> technologies", Jonathan Kew, TUGboat, Vol. 25 (2005), No. 2.
>
> "Hyphenation support:  Along with other character-code-oriented parts
> of TeX, the hyphenation tables in XeTex have been extended to support
> 16-bit Unicode characters.  This means that it is possible to write
> hyphenation patterns that use any (Plane 0) Unicode letters, including
> non-Latin scripts as well as extended Latin (accented characters,  
> etc.)"
>
> Taken at face value, these statements would seem to indicate that
> one cannot define \lccodes for Deseret Alphabet characters (there is
> an uppercase/lowercase distinction in this alphabet) and that one  
> cannot
> define hyphenation tables over supplementary characters.
>
> Am I stuck? or am I missing something?
>
> Thanks,
>
> Ken
>
>
> ------------------------------
>
> Message: 2
> Date: Fri, 11 May 2007 21:53:03 +0100
> From: Jonathan Kew <jonathan_kew at sil.org>
> Subject: Re: [XeTeX] XeTeX hyphenation support for supplementary
> 	chars?
> To: Unicode-based TeX for Mac OS X and other platforms <xetex at tug.org>
> Message-ID: <A8DDA88D-DC66-4537-BCB3-734084D4136F at sil.org>
> Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed
>
> On 11 May 2007, at 8:26 pm, Kenneth Reid Beesley wrote:
>
>> "XeTeX, the Multilingual Lion:  TeX meets Unicode and smart font
>> technologies", Jonathan Kew, TUGboat, Vol. 25 (2005), No. 2.
>>
>> "Hyphenation support:  Along with other character-code-oriented parts
>> of TeX, the hyphenation tables in XeTex have been extended to support
>> 16-bit Unicode characters.  This means that it is possible to write
>> hyphenation patterns that use any (Plane 0) Unicode letters,  
>> including
>> non-Latin scripts as well as extended Latin (accented characters,
>> etc.)"
>>
>> Taken at face value, these statements would seem to indicate that
>> one cannot define \lccodes for Deseret Alphabet characters (there is
>> an uppercase/lowercase distinction in this alphabet) and that one
>> cannot
>> define hyphenation tables over supplementary characters.
>
> You are correct.
>
>> Am I stuck? or am I missing something?
>
> These statements are accurate as of XeTeX 0.996, the latest released
> version, and so you are currently stuck.
>
> However, this has been changed for version 0.997, currently in
> development. While that has not yet been released, the extension to
> full Unicode support is present in the 0.997-dev version that you get
> if you build from the Subversion repository at <http://
> scripts.sil.org/svn/xetex/TRUNK/>.
>
> So you will be able to do this once 0.997 is released, or if you
> build from source in the meantime. (Actually, I haven't tested
> supplementary-plane hyphenation patterns yet; I'd better do that
> before releasing the new version! Please let me know if you do try
> this.)
>
> I can think of a possible workaround, if you're not ready to compile
> xetex from source: create a font that encodes the Deseret alphabet in
> the Plane 0 Private Use Area, and load this font with a font mapping
> that converts the true Plane 1 values in your data to the PUA codes.
> Then you will be able to define hyphenation patterns in terms of the
> PUA codes you're using, even though your actual text remains
> correctly encoded in Plane 1. It's a hack, but I believe it should
> work. (Untested.)
>
> JK
>
>
>


******************************
Kenneth R. Beesley, D.Phil.
P.O. Box 540475
North Salt Lake, UT
84054  USA







More information about the XeTeX mailing list