[XeTeX] Hyphenation of "--" with tex-text mapping on

Jonathan Kew jonathan_kew at sil.org
Sat Nov 5 03:27:03 CET 2005


On 4 Nov 2005, at 9:44 am, Bruno Voisin wrote:

> I've just noticed a strange thing, which after all might be normal:  
> using the Optima font with tex-text mapping, through the fontpec  
> package and:
>
> \setromanfont[Mapping=tex-text]{Optima Regular}
>
> then inputting en-dashes in traditional TeX form as --, it turns  
> out the en-dash may be hyphenated in the middle, yielding the  
> following output (the input is CNRS--INRA--INSERM--Couperin):
>
>
> <Image 1.png>
>
> (note the hyphenation at the end of the first line). If now the en- 
> dash is input directly as a Unicode character (so that the input is  
> CNRS–INRA--INSERM--Couperin), then it isn't hyphenated:
>
>
> <Image 2.png>
>
> Is this normal TeX behaviour, or a feature of the font mapping in  
> TECkit?

It's a known limitation of using the font mapping mechanism to  
simulate this legacy TeX convention. (Well, it was known to me,  
anyhow! But I don't think it's been discussed previously.) You'll  
find em-dashes entered as --- are similarly vulnerable.

The reason it can happen is that font mappings (unlike traditional  
TFM-based ligatures) are completely invisible to TeX's line-breaking  
process. As far as it is concerned, the text still contains two  
hyphens, and there's a legitimate discretionary break after each of  
them. This is different from TeX using a TFM font, where the pair of  
hyphens is replaced by a ligature node, and the only resulting  
discretionary break is after that node. Which means that the font  
mapping mechanism is not always completely equivalent to the old TFM  
ligatures.

When writing for TUGboat recently, I noted that the proper style  
there involves use of a \Dash macro, rather than ---, as the way to  
generate a dash. Doing something like this makes it easy to ensure  
the proper behavior in both legacy and Unicode cases; I used a XeTeX- 
friendly Unicode definition of \Dash in my document in order to avoid  
this precise issue.

JK



More information about the XeTeX mailing list