[XeTeX] Conflict between xunicode and fontspec?

Jonathan Kew jonathan_kew at sil.org
Wed Feb 6 01:01:04 CET 2008


On 5 Feb 2008, at 6:25 pm, Julien ÉLIE wrote:

> Hi Arthur,
>
> First of all, thanks for your answer.
>
>> You should not use inputenc and fontenc with XeTeX, they simply don't
>> support XeTeX at all.
>
> Well, I have just tried polyglossia:
>
> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
> % encoding: utf-8
>
> \documentclass[a4paper,12pt]{article}
>
> \usepackage{polyglossia}
> \setdefaultlanguage{french}
>
> \usepackage{xltxtra}
> \usepackage{hyperref}
>
> \begin{document}
>
> âêîôû \textbf{âêîôû}
>
> test ! test ! test! test~!
>
> \end{document}
> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
>
> Accents are good but I do not have the right spaces before "!"...
> The first one is an unbreakable space and the second one is a normal
> space.  And the result is that I have *two* spaces for the first one

This sounds like polyglossia doesn't recognize the non-breaking space  
as a "space", and so adds space of its own; I expect François can  
update this.

> and different kinds of *one* space for the others.

In the case of "test!", I think polyglossia is providing a \kern of a  
certain width. Presumably "test !" and "test~!" simply give you the  
standard space, which may not be the same.

>
> It is not good at all...
>
> However, if I add:
>
>     \usepackage[latin1]{inputenc}
>     \usepackage[T1]{fontenc}
>
> The result is fine!

I guess \usepackage[latin1]{inputenc} has the effect of converting  
some of the accented characters, and probably the non-breaking space,  
into LaTeX control sequences, and then some internal macros may deal  
with them differently. However, this is not a good idea in xelatex;  
if you think about it, you're actually misleading the software,  
claiming that your text is Latin-1 when in fact it was UTF-8!

The only reason your accented characters survived at all is that  
their Unicode values happen to coincide with their Latin-1  
codepoints. So after xetex has decoded the UTF-8 bytes into Unicode  
characters, the inputenc package then "decodes" those character  
values into LaTeX macros. But this will not work in most other cases;  
you were lucky that Latin-1 and Unicode happen to share codepoints  
for the characters of interest.

I don't know exactly how fontenc gets involved here; it may mean that  
you end up using different virtual fonts, or something. Did you try  
this in combination with fontspec-selected fonts, not just the  
default CM/LM?

If there are language-specific issues like space before footnotes  
that polyglossia doesn't yet handle, I hope François will consider  
adding support for these; I think this is a much better way forward  
than trying to use combinations of old stuff (built for legacy byte  
encodings and fonts) and the new Unicode mechanisms.

JK



More information about the XeTeX mailing list