[XeTeX] Strange hyphenation with polyglossia in French

Roland Kuhn rk at rkuhn.info
Sun Oct 17 10:09:12 CEST 2010

On Oct 16, 2010, at 15:21 , Cyril Niklaus wrote:

> On 16 oct. 2010, at 19:44, enrico.gregorio at univr.it wrote:
>> It's quite subtle, I believe. There are no patterns containing U+2019 (RIGHT
>> SINGLE QUOTATION MARK), into which each apostrophe is changed by
>> tex-text.map; so the pattern "1informat" comes into play, creating a hyphenation
>> point in "l'information" just after the character U+2019.
> […]
>> Indeed, also "l'alcool" gets hyphenated as "l’-al-cool", as there is the pattern "1alcool"
>> on line 126 of hyph-fr.tex
>> This is a problem which should be examined by the "hyphenation pattern team":
>> all patterns containing the apostrophe should be duplicated with U+2019 in its place.
>> It may show its effects also in Italian and all other languages where the apostrophe
>> gets a nonzero \lccode for hyphenation purposes.
> and 
> On 16 oct. 2010, at 20:42, Mojca Miklavec wrote:
>> what you
>> observe is a "known problem that needs a nice idea to solve it" (or we
>> can simply create and load another bunch of patterns) and it's present
>> in both XeTeX and LuaTeX (only that it's mapped to quotation mark in
>> LuaTeX).
> […]
>> We would need to double all the hyphenation patterns to account for
>> that case (including both apostrophe and quotation marks). An
>> alternative would be to "explain to engine" that two characters
>> hyphenate in exactly the same way. The latter is possible, but we
>> never (managed to) implement it. It might be as simple as one line of
>> code though ...
> OK, so I understand the nature of the problem now, thanks to all of you.
> As much as I would like to find that one line of code, my coding skills are inexistent unfortunately, and I could never produce what the great minds on this list have made. If I somehow reach illumination and find a way to deal with this, I will of course let you know.
> On 16 oct. 2010, at 20:57, Jonathan Kew wrote:
>> Would setting
>> \lccode "2019 = "27
>> be any help?
> I do have it in the document preamble, to no effect (with straight or curled apostrophes).
Well, setting \lccode"2019="0027 actually does fix this problem. Of course, this needs to be done after polyglossia has had its say (e.g. after \begin{document} or after your closest language changing command), because gloss-french.ldf resets it. So, I think patching gloss-french.ldf would be the minimal fix.

On the other hand, why not do it right? "0027 is some ASCII single-high-vertical-short-line which was used in the middle-ages of text input to mean apostrophe, single quotation mark, prime, etc. Now that we have gone way past the french revolution (pun intended), why not enter those characters as they deserve? I find myself not using tex-text.map anymore, since terminals and editors can properly display all those nice glyphs directly.

BTW: does anyone know a “terminal” which can use proportional fonts? That would be a nice compromise between Turing-complete language and intra-paragraph-WYSIWYG (I’m sorry to say that I’m basically married to vim).

> In the meantime, the "solution" I used was to change fonts…
That basically disables hyphenation for this word, like would \/.


Simplicity and elegance are unpopular because they require hard work and discipline to achieve and education to be appreciated.
  -- Dijkstra

More information about the XeTeX mailing list