[XeTeX] Hyphenation of compounds, and interaction with teckit

Jonathan Kew jonathan_kew at sil.org
Thu Dec 9 00:42:40 CET 2004


On 8 Dec 2004, at 11:10 pm, Bruno Voisin wrote:

> Hello,
>
> A puzzle regarding hyphenation of compound words in XeTeX: some time 
> ago there was a discussion about this, which led Jonathan to suggest 
> putting in one's XeTeX input file
>
> 	\lccode`\- = 0

(....as a workaround for a hyphenation bug, due to be corrected in the 
next release.)

>
> Applying this to an input file of mine, and using in all cases 
> :mapping=tex-text (with Hoefler Text, without Will's fontspec.sty --- 
> yet):
>
> - With the above, I get
>
> 	"small-amplitude" not hyphenated and protruding in the right margin
>
> 	"Mellin--Barnes" hyphenated as Mel-/lin--Barnes (/ meaning newline)
>
> 	"Mellin–Barnes" hyphenated as Mel-/lin–Barnes (with "–" the Unicode 
> en-dash)
>
> - Without the above, I get
>
> 	"small-amplitude" hyphenated as small-amp-/litude
>
> 	"Mellin--Barnes" not hyphenated and protruding in the right margin
>
> 	"Mellin–Barnes" hyphenated as Mel-/lin–Barnes
>
> Are there bugs here, or is this all to be expected? I must say I 
> prefer XeTeX's original behaviour, which is to hyphenate compound 
> words, whether containing standard dashes "-" or en-dashes "–".

A brief answer: the difference between "Mellin--Barnes" and 
"Mellin–Barnes" arises because hyphenation applies to the character 
string as found in the document (after macro expansion, etc.); the fact 
that the font-mapping changes "--" into "–" is at a lower level (part 
of the font rendering process, in effect).

So to ensure these two forms behave the same, you'd need to set the 
properties of the Unicode en-dash appropriately so that it suppresses 
auto-hyphenation in a similar way to the actual hyphen character. (Not 
sure exactly what that would require, offhand.)

Regarding hyphenation portions of compound words, this is contrary to 
TeX's rules, and should be changed by the forthcoming bug-fix; if you 
still want it to happen, you can permit it by appropriately programming 
the en-dash character. (Make it an active character that adds a 0pt 
\hskip each side, or something like that, I'd guess.)

JK



More information about the XeTeX mailing list