[XeTeX] Strange hyphenation with polyglossia in French

enrico.gregorio at univr.it enrico.gregorio at univr.it
Sat Oct 16 12:44:32 CEST 2010

> Hello all,
> I'd never had (or noticed) that problem before, so I don't know 
> if it's a new thing or something I do that does not comply. The 
> problem is simple, hyphenation occurs between an apostrophe and 
> the word it follows : l'information in my case becomes l'-information.
> I'm using the latest updated 2010 MacTeX.
> I didn't find anything helpful in my searches online, so, since 
> I cannot change the text (it's an article from Le Monde)  
> what can I do?
> In making the small version I include here, I also noticed 
> something surprising: the hyphenation changed depending on the 
> length of the text I cut *after*. Leaving one sentence or two 
> did not give the same results. Cutting immediately after the 
> sentence gave the correct l'in-formation and  l'infor-
> mation was what I got when cutting before De toute part.
> I also noticed that including or not
> \defaultfontfeatures{Mapping=tex-text}
> changes things quite a bit while \frenchspacing did nothing 
> obvious. I thought it would deal with spaces around the 
> guillemets etc. but no. I'm wondering why I bothered including 
> it. Is that a benefit from polyglossia?

\frenchspacing is issued anyway when the current language is French.
It makes sense, doesn't it? :)

The "Mapping=tex-text" options makes available all usual TeX ligature
conventions (`? for the reversed question mark, --- for a dash and so on).

I tried some other words such as "l'avenir", which are hyphenated correctly.
I believe that the problem is in line 615 of hyph-fr.tex (the file containing the
hyphenation patterns for French) which reads "1informat".

It's quite subtle, I believe. There are no patterns containing U+2019 (RIGHT
SINGLE QUOTATION MARK), into which each apostrophe is changed by
tex-text.map; so the pattern "1informat" comes into play, creating a hyphenation
point in "l'information" just after the character U+2019.

Indeed, also "l'alcool" gets hyphenated as "l’-al-cool", as there is the pattern "1alcool"
on line 126 of hyph-fr.tex.

It's a vicious circle: without "Mapping=tex-text" the apostrophes in the input don't
get transformed into the right character (all Unicode fonts have a straight quote in
that position); but when it's used, there are no patterns containing U+2019.

This is a problem which should be examined by the "hyphenation pattern team":
all patterns containing the apostrophe should be duplicated with U+2019 in its place.
It may show its effects also in Italian and all other languages where the apostrophe
gets a nonzero \lccode for hyphenation purposes.


Enrico Gregorio          + Dipartimento di Informatica          + Tel: +39 045 8027937
Enrico.Gregorio at univr.it + Università degli Studi di Verona     +
(gregorio at math.unipd.it) + Strada le Grazie 15 / I-37134 Verona + Fax: +39 045 8027928

More information about the XeTeX mailing list