[XeTeX] Quote from TeXhax thread

Jonathan Kew jonathan_kew at sil.org
Thu Oct 18 13:06:40 CEST 2007


On 18 Oct 2007, at 11:16 am, Axel E. Retif wrote:

> I can't help quoting none other than Sebastian Rahtz from this TeXhax
> thread:
>
> http://tug.org/pipermail/texhax/2007-October/thread.html#9150
>
> --------------
>
> From: Sebastian Rahtz <sebastian.rahtz at oucs.ox.ac.uk>
> Date: 15 October, 2007 07:03:29
> To: Chris Rowley <C.A.Rowley at open.ac.uk>
> Cc: texhax at tug.org, tex-live at tug.org, madduck at debian.org,
> 446476 at bugs.debian.org, Ralf Stubner <ralf.stubner at web.de>
> Subject: Re: [texhax] [tex-live] Bug#446476: Bug#446476: Bug#446476:
> natbib cannot	handle utf8
>
> Personally, I gave up on all this ucs.sty/inputenc stuff last year
> and switched to xelatex instead. Now it's native Unicode joy every
> time, instead of endless problems once you stray beyond the latin
> world....
>
> --  
> Sebastian Rahtz
> ---------------

Yes, I saw that comment on the TeX Live list, though I don't keep up  
with texhax.

> I think Jonathan Kew and other knowledgeable people of this list
> should take a look at that thread, especially to Martin Schröder's
> qualms about input encodings.

It's true that there is a backward compatibility issue with legacy 8- 
bit encodings. In an ideal world, I suppose the inputenc package  
would be xetex-aware, and "do the right thing" to map the 8-bit text  
to Unicode when running under xelatex. But even that isn't a complete  
solution, as there would then be a mismatch with legacy (non-Unicode)  
TeX fonts!

To get full backward compatibility, you'd have to switch xetex to  
"byte" input mode (with \XeTeXinputencoding), and let it process the  
input exactly like standard TeX does.

A tricky aspect of this would be the handling of auxiliary files, as  
xetex will always write these as UTF-8. So even if you're using  
"bytes" as the input encoding for your actual document files, you'd  
need to reset to UTF-8 whenever LaTeX reads back an aux file that  
it's written.

My own view is that when you adopt xetex features (such as fontspec  
for font loading), you should move the source document to Unicode,  
and not try to straddle the 8-bit and Unicode worlds. And to run old  
documents unchanged, the simplest solution is to use an old engine!

I'd love to see \usepackage[utf8]{inputenc} be enhanced to recognize  
xetex and "do nothing" in this case, leaving the engine to handle the  
Unicode data. Whether it's worth trying to deal with other inputenc  
options as well is more questionable, IMO. But I just haven't had  
time to pursue this (and there hasn't been great pressure from users;  
people seem happy enough to remove \usepackage{inputenc} from their  
xelatex documents).

JK



More information about the XeTeX mailing list