[XeTeX] help with hyphenation

Jonathan Kew jonathan_kew at sil.org
Sat Feb 2 19:03:33 CET 2008


On 2 Feb 2008, at 5:45 pm, ashinpan at gmail.com wrote:

> Hi! all
>
> I having been using XeTex as part of TexLive 2007 on Ubuntu (Gutsy
> Gibbon). My documents are mainly in English but Pali and Sanskrit are
> often embedded in them.
>
> The problem I am facing is some words are getting randomly hyphenated
> before linebreaks. I tried to change the language to Welsh, of which I
> have no language file, to force manual hyphenation but no use. Again I
> tried to remove the package Babel, but the problem still persists.
>
> Below is the tex source that I use. I have also attached a PDF file
> that XeTex produces on my machine from that source. In that PDF file,
> I see the following unnatural mid-line hyphenations, provided together
> with respective line numbers:
>
> design-ing (1)
> ope -rat- ing (2)
> pro-ducts (5)
> fly-ing (6)
> pos-sesses (8)
> under-stand(9)
> mak-ing (18)
> natu-ral (32)
>
> I hope someone would kindly help me out.

Looking at the TeX source, I see that these are present in the input  
text as "soft hyphen" characters, U+00AD. Perhaps your editor is  
inserting these automatically, and you don't see them on screen while  
editing?

XeTeX doesn't inherently "know" anything special about the U+00AD  
character, so they're simply being printed in the current font.

I think you want to do one of three possible things:

(a) remove the "soft hyphens" from the input text, and prevent your  
editor inserting them; just rely on TeX's hyphenation patterns where  
necessary

(b) if that's difficult, you could cause TeX to ignore them by  
"defining them away":

     \catcode"AD=\active  \def^^ad{}

(c) if you want them to act as discretionary hyphens, overriding  
whatever hyphenation points TeX might find automatically in those  
words, then define them as TeX discretionaries:

     \catcode"AD=\active  \def^^ad{\-}


I might add (c) as a default definition to the xetex and xelatex  
formats, as it seems like the most logical thing to do with U+00AD.  
But in general, you shouldn't need these in your text at all.

HTH,

JK



More information about the XeTeX mailing list