[XeTeX] Conflict between xunicode and fontspec?
Jonathan Kew
jonathan_kew at sil.org
Wed Feb 6 01:01:04 CET 2008
On 5 Feb 2008, at 6:25 pm, Julien ÉLIE wrote:
> Hi Arthur,
>
> First of all, thanks for your answer.
>
>> You should not use inputenc and fontenc with XeTeX, they simply don't
>> support XeTeX at all.
>
> Well, I have just tried polyglossia:
>
> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
> % encoding: utf-8
>
> \documentclass[a4paper,12pt]{article}
>
> \usepackage{polyglossia}
> \setdefaultlanguage{french}
>
> \usepackage{xltxtra}
> \usepackage{hyperref}
>
> \begin{document}
>
> âêîôû \textbf{âêîôû}
>
> test ! test ! test! test~!
>
> \end{document}
> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
>
> Accents are good but I do not have the right spaces before "!"...
> The first one is an unbreakable space and the second one is a normal
> space. And the result is that I have *two* spaces for the first one
This sounds like polyglossia doesn't recognize the non-breaking space
as a "space", and so adds space of its own; I expect François can
update this.
> and different kinds of *one* space for the others.
In the case of "test!", I think polyglossia is providing a \kern of a
certain width. Presumably "test !" and "test~!" simply give you the
standard space, which may not be the same.
>
> It is not good at all...
>
> However, if I add:
>
> \usepackage[latin1]{inputenc}
> \usepackage[T1]{fontenc}
>
> The result is fine!
I guess \usepackage[latin1]{inputenc} has the effect of converting
some of the accented characters, and probably the non-breaking space,
into LaTeX control sequences, and then some internal macros may deal
with them differently. However, this is not a good idea in xelatex;
if you think about it, you're actually misleading the software,
claiming that your text is Latin-1 when in fact it was UTF-8!
The only reason your accented characters survived at all is that
their Unicode values happen to coincide with their Latin-1
codepoints. So after xetex has decoded the UTF-8 bytes into Unicode
characters, the inputenc package then "decodes" those character
values into LaTeX macros. But this will not work in most other cases;
you were lucky that Latin-1 and Unicode happen to share codepoints
for the characters of interest.
I don't know exactly how fontenc gets involved here; it may mean that
you end up using different virtual fonts, or something. Did you try
this in combination with fontspec-selected fonts, not just the
default CM/LM?
If there are language-specific issues like space before footnotes
that polyglossia doesn't yet handle, I hope François will consider
adding support for these; I think this is a much better way forward
than trying to use combinations of old stuff (built for legacy byte
encodings and fonts) and the new Unicode mechanisms.
JK
More information about the XeTeX
mailing list