[XeTeX] Conflict between xunicode and fontspec?

Wed Feb 6 01:01:04 CET 2008

On 5 Feb 2008, at 6:25 pm, Julien ÉLIE wrote:

> Hi Arthur,
>
> First of all, thanks for your answer.
>
>> You should not use inputenc and fontenc with XeTeX, they simply don't
>> support XeTeX at all.
>
> Well, I have just tried polyglossia:
>
> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
> % encoding: utf-8
>
> \documentclass[a4paper,12pt]{article}
>
> \usepackage{polyglossia}
> \setdefaultlanguage{french}
>
> \usepackage{xltxtra}
> \usepackage{hyperref}
>
> \begin{document}
>
> âêîôû \textbf{âêîôû}
>
> test ! test ! test! test~!
>
> \end{document}
> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
>
> Accents are good but I do not have the right spaces before "!"...
> The first one is an unbreakable space and the second one is a normal
> space.  And the result is that I have *two* spaces for the first one

This sounds like polyglossia doesn't recognize the non-breaking space  
as a "space", and so adds space of its own; I expect François can  
update this.

> and different kinds of *one* space for the others.

In the case of "test!", I think polyglossia is providing a \kern of a  
certain width. Presumably "test !" and "test~!" simply give you the  
standard space, which may not be the same.

>
> It is not good at all...
>
> However, if I add:
>
>     \usepackage[latin1]{inputenc}
>     \usepackage[T1]{fontenc}
>
> The result is fine!

I guess \usepackage[latin1]{inputenc} has the effect of converting  
some of the accented characters, and probably the non-breaking space,  
into LaTeX control sequences, and then some internal macros may deal  
with them differently. However, this is not a good idea in xelatex;  
if you think about it, you're actually misleading the software,  
claiming that your text is Latin-1 when in fact it was UTF-8!

The only reason your accented characters survived at all is that  
their Unicode values happen to coincide with their Latin-1  
codepoints. So after xetex has decoded the UTF-8 bytes into Unicode  
characters, the inputenc package then "decodes" those character  
values into LaTeX macros. But this will not work in most other cases;  
you were lucky that Latin-1 and Unicode happen to share codepoints  
for the characters of interest.

I don't know exactly how fontenc gets involved here; it may mean that  
you end up using different virtual fonts, or something. Did you try  
this in combination with fontspec-selected fonts, not just the  
default CM/LM?

If there are language-specific issues like space before footnotes  
that polyglossia doesn't yet handle, I hope François will consider  
adding support for these; I think this is a much better way forward  
than trying to use combinations of old stuff (built for legacy byte  
encodings and fonts) and the new Unicode mechanisms.

JK