[XeTeX] Strange hyphenation with polyglossia in French

enrico.gregorio at univr.it enrico.gregorio at univr.it
Mon Oct 18 22:36:41 CEST 2010


> Would \savinghyphcodes help? According to the documentation of
> e-TeX, setting this parameter to a positive value would save the 
> \lccodevalues in effect during the execution of \patterns and e-TeX (so also
> XeTeX and LuaTeX) would use those "frozen" values for hyphenation
> purposes.

I add the result of a small experiment. I copied a version of loadhyph-it.tex
renaming it to hyphen.tex and modifying it thus:

BEFORE:
\begingroup
\lccode`\'=`\'
% ASCII patterns - no additional support is needed
\message{ASCII Italian Hyphenation Patterns}
\input hyph-it.tex
\endgroup

AFTER:
\begingroup
\savinghyphcodes=1
\lccode`\'=`\'
\lccode"2019=`\'
% ASCII patterns - no additional support is needed
\message{ASCII Italian Hyphenation Patterns}
\input hyph-it.tex
\endgroup

I ran "xetex -jobname myxetex -ini -etex plain" getting myxetex.fmt in the
current directory. Then I prepared the following test file:

=== test.tex ===
\font\1="TeX Gyre Pagella/ICU:script=latn;language=DFLT;mapping=tex-text;"

\1

\lccode"2019="2019

\hsize=3pt

\noindent a dell'amicizia

\bye
===

Then I ran xetex by "xetex -fmt myxetex test" and this was the result on the
terminal:

=== terminal output ===
This is XeTeX, Version 3.1415926-2.2-0.9997.4 (TeX Live 2010)
 restricted \write18 enabled.
entering extended mode
(./test.tex
Overfull \hbox (2.0pt too wide) in paragraph at lines 9--10
\1 a|

Overfull \hbox (14.14pt too wide) in paragraph at lines 9--10
\1 del-|

Overfull \hbox (11.02pt too wide) in paragraph at lines 9--10
\1 l’a-|

Overfull \hbox (12.07pt too wide) in paragraph at lines 9--10
\1 mi-|

Overfull \hbox (7.68001pt too wide) in paragraph at lines 9--10
\1 ci-|

Overfull \hbox (9.91pt too wide) in paragraph at lines 9--10
\1 zia |
[1] )
(see the transcript file for additional information)
Output written on test.pdf (1 page).
Transcript written on test.log.
===

Note that the apostrophe in the input has been translated into right quote
(U+2019) as requested by the mapping file, but nevertheless the hyphenation
was correct. Commenting out the \lccode"2019="2019 line, the word was not
hyphenated (which is correct, since in "naked" plain TeX the lccode is 0).

I used Italian, since the French pattern file contains proper UTF-8 characters
and it's necessary to initialize the lccode table for them. But the experiment
seems to prove that this works. Alas, it doesn't in LuaTeX, because I read that
\savinghyphcodes is not enabled in the current version (but it should be in 0.70,
according to Taco Hoekwater).

Ciao
Enrico

--
Enrico Gregorio          + Dipartimento di Informatica          + Tel: +39 045 8027937
Enrico.Gregorio at univr.it + Università degli Studi di Verona     +
(gregorio at math.unipd.it) + Strada le Grazie 15 / I-37134 Verona + Fax: +39 045 8027928




More information about the XeTeX mailing list