[XeTeX] New Hyphenation for Phonemic Orthographies for English?

Jonathan Kew jonathan_kew at sil.org
Fri Apr 20 00:00:57 CEST 2007

On 19 Apr 2007, at 8:04 pm, Kenneth Reid Beesley wrote:
> My current project sets Deseret Alphabet and International
> Phonetic Alphabet in parallel columns, and I need to define
> hyphenation for both.  Too many long words are splaying over
> the right margins.  The language is English, but written in
> two separate phonemic-alphabet orthographies.  In the future,
> might need hyphenation for the Shavian Alphabet as well.
>    Q1:  What's the best approach to defining possible hyphenation
>    points for English words written in these orthographies?  The  
> Deseret
>    Alphabet and Shavian characters lie in the supplementary Unicode  
> space,
>    so whatever scheme I adopt would need to handle supplementary
>    characters.
>    Q2:  Is it possible to specify globally that the end of an em- 
> dash is
>    a possible hyphenation point in text?

I'm not sure if anyone has yet tried to define hyphenation patterns  
for languages using supplementary-plane characters in XeTeX, but  
offhand I can't think of any reason it shouldn't work. Note that  
defining hyphenation patterns in general is a fairly technical job,  
though. In any case, the first question would be what the rules are  
supposed to be for hyphenation of English in these various  
orthographies; is there any established practice to follow, or will  
you be making up your own rules? Based on syllabic or morphemic  
boundaries (or something else)?

Re Q2, there's a simple answer; just include the setting

   \XeTeXdashbreakstate = 1

in your document. This tells XeTeX to allow line-breaks after the  
Unicode en-dash and em-dash characters, just as standard TeX normally  
does with "--" and "---" ligatures.

(I just noticed that Will's "XeTeX reference guide" claims that this  
parameter is set to 1 by default, but I don't think that's true, at  
least in the standard TeX Live 2007 configuration.)


More information about the XeTeX mailing list