[XeTeX] Strange hyphenation with polyglossia in French

Mojca Miklavec mojca.miklavec.lists at gmail.com
Wed Oct 20 00:21:12 CEST 2010


On Mon, Oct 18, 2010 at 22:36,  <enrico.gregorio at univr.it> wrote:
>> Would \savinghyphcodes help? According to the documentation of
>> e-TeX, setting this parameter to a positive value would save the
>> \lccodevalues in effect during the execution of \patterns and e-TeX (so also
>> XeTeX and LuaTeX) would use those "frozen" values for hyphenation
>> purposes.
>
> I add the result of a small experiment.
/../
> Note that the apostrophe in the input has been translated into right quote
> (U+2019) as requested by the mapping file, but nevertheless the hyphenation
> was correct. Commenting out the \lccode"2019="2019 line, the word was not
> hyphenated (which is correct, since in "naked" plain TeX the lccode is 0).
>
> I used Italian, since the French pattern file contains proper UTF-8 characters
> and it's necessary to initialize the lccode table for them. But the experiment
> seems to prove that this works.

I spent quite some time chatting with Arthur and testing ...

http://tug.org/svn/texhyphen/trunk/tests/testsuite/apostrophe/newlanguage/?pathrev=485
http://tug.org/svn/texhyphen/trunk/tests/testsuite/apostrophe/newlanguage/?pathrev=486

or

svn co  -r 485 svn://tug.org/texhyphen/trunk/tests/testsuite/apostrophe/newlanguage

Note that the two revisions have completely different patterns, so
completely different results are to be expected, but to me the
functionality of \savinghyphcodes seems a bit buggy and not usable to
implement "apostrophe is equivalent to quotation mark". For example
when I use \savinghyphcodes=1:

1.) When I set lccode of "0027 to "0027 and lccode of "2019 to "0027
in pattern file, the "0027 behaves like a letter even if I don't set
lccode in document, while "2019 doesn't behave like a letter. If I set
lccode("2019)="0027 in document, hyphenation of "2019 works (but it
also works without the savinghyphcodes trickery) while
lccode("2019)="2019 is just as useless as without the trickery.

2.) When in addition to 1.) I also set lccode(letter n)="0027 in
patterns, then it doesn't behave like "0027 (there's no hyphenation
point where there should be one) neither if I set lccode(letter
n)="0027 in document nor if I don't. If apostrophe hyphenates
automatically without even having lccode set, letter n should
hyphenate just as well. But what is worse: now I cannot even convince
the letter n to be hyphenated at all, no matter what I do. That is
basically three different behaviours for three different characters
that should behave in (almost) exactly the same way.

I'm sorry: my explanation above is probably not really clear and the
examples might need some extra documentation ... but I'm tempted to
consider the current behaviour a bug unless someone can explain me why
that behaviour is expected.


On the other hand, even if \savinghyphcodes command gets fixed ...
there is still a problem. Namely, there is absolutely no alternative
way to change lccode for hyphenation any more. (For example: if I
decide that quotation mark has to be treated like a letter and be
hyphenated like apostrophe and implement that in hyph-utf8, there is
no way for the end user to prevent that behaviour unless he fixes the
formats.)

It kind-of-seems to me that having both hyphenation (as well as
\lowercase) based \lccode is almost "broken by design".

At the moment the only viable solution I see is really to have two
sets of patterns; and XeTeX should then load duplicated patterns (one
with apostrophe and one with quotation mark).

I would really prefer a solution where I could tell XeTeX to treat the
two characters equally, but I simply don't know how to do that ...
Arthur also reminded me that one might want to treat scedilla and
scommaaccent as equivalent characters for Romanian, so the apostrophe
is not the lonely case ...

> Alas, it doesn't in LuaTeX, because I read that
> \savinghyphcodes is not enabled in the current version (but it should be in 0.70,
> according to Taco Hoekwater).

I added him to CC. Maybe he has some ideas about why this doesn't work.

Mojca


More information about the XeTeX mailing list