[XeTeX] Strange hyphenation with polyglossia in French
enrico.gregorio at univr.it
enrico.gregorio at univr.it
Sat Oct 16 12:44:32 CEST 2010
> Hello all,
> I'd never had (or noticed) that problem before, so I don't know
> if it's a new thing or something I do that does not comply. The
> problem is simple, hyphenation occurs between an apostrophe and
> the word it follows : l'information in my case becomes l'-information.
> I'm using the latest updated 2010 MacTeX.
> I didn't find anything helpful in my searches online, so, since
> I cannot change the text (it's an article from Le Monde)
> what can I do?
> In making the small version I include here, I also noticed
> something surprising: the hyphenation changed depending on the
> length of the text I cut *after*. Leaving one sentence or two
> did not give the same results. Cutting immediately after the
> sentence gave the correct l'in-formation and l'infor-
> mation was what I got when cutting before De toute part.
> I also noticed that including or not
> changes things quite a bit while \frenchspacing did nothing
> obvious. I thought it would deal with spaces around the
> guillemets etc. but no. I'm wondering why I bothered including
> it. Is that a benefit from polyglossia?
\frenchspacing is issued anyway when the current language is French.
It makes sense, doesn't it? :)
The "Mapping=tex-text" options makes available all usual TeX ligature
conventions (`? for the reversed question mark, --- for a dash and so on).
I tried some other words such as "l'avenir", which are hyphenated correctly.
I believe that the problem is in line 615 of hyph-fr.tex (the file containing the
hyphenation patterns for French) which reads "1informat".
It's quite subtle, I believe. There are no patterns containing U+2019 (RIGHT
SINGLE QUOTATION MARK), into which each apostrophe is changed by
tex-text.map; so the pattern "1informat" comes into play, creating a hyphenation
point in "l'information" just after the character U+2019.
Indeed, also "l'alcool" gets hyphenated as "l’-al-cool", as there is the pattern "1alcool"
on line 126 of hyph-fr.tex.
It's a vicious circle: without "Mapping=tex-text" the apostrophes in the input don't
get transformed into the right character (all Unicode fonts have a straight quote in
that position); but when it's used, there are no patterns containing U+2019.
This is a problem which should be examined by the "hyphenation pattern team":
all patterns containing the apostrophe should be duplicated with U+2019 in its place.
It may show its effects also in Italian and all other languages where the apostrophe
gets a nonzero \lccode for hyphenation purposes.
Enrico Gregorio + Dipartimento di Informatica + Tel: +39 045 8027937
Enrico.Gregorio at univr.it + Università degli Studi di Verona +
(gregorio at math.unipd.it) + Strada le Grazie 15 / I-37134 Verona + Fax: +39 045 8027928
More information about the XeTeX