[XeTeX] Quote from TeXhax thread
Jonathan Kew
jonathan_kew at sil.org
Thu Oct 18 13:06:40 CEST 2007
On 18 Oct 2007, at 11:16 am, Axel E. Retif wrote:
> I can't help quoting none other than Sebastian Rahtz from this TeXhax
> thread:
>
> http://tug.org/pipermail/texhax/2007-October/thread.html#9150
>
> --------------
>
> From: Sebastian Rahtz <sebastian.rahtz at oucs.ox.ac.uk>
> Date: 15 October, 2007 07:03:29
> To: Chris Rowley <C.A.Rowley at open.ac.uk>
> Cc: texhax at tug.org, tex-live at tug.org, madduck at debian.org,
> 446476 at bugs.debian.org, Ralf Stubner <ralf.stubner at web.de>
> Subject: Re: [texhax] [tex-live] Bug#446476: Bug#446476: Bug#446476:
> natbib cannot handle utf8
>
> Personally, I gave up on all this ucs.sty/inputenc stuff last year
> and switched to xelatex instead. Now it's native Unicode joy every
> time, instead of endless problems once you stray beyond the latin
> world....
>
> --
> Sebastian Rahtz
> ---------------
Yes, I saw that comment on the TeX Live list, though I don't keep up
with texhax.
> I think Jonathan Kew and other knowledgeable people of this list
> should take a look at that thread, especially to Martin Schröder's
> qualms about input encodings.
It's true that there is a backward compatibility issue with legacy 8-
bit encodings. In an ideal world, I suppose the inputenc package
would be xetex-aware, and "do the right thing" to map the 8-bit text
to Unicode when running under xelatex. But even that isn't a complete
solution, as there would then be a mismatch with legacy (non-Unicode)
TeX fonts!
To get full backward compatibility, you'd have to switch xetex to
"byte" input mode (with \XeTeXinputencoding), and let it process the
input exactly like standard TeX does.
A tricky aspect of this would be the handling of auxiliary files, as
xetex will always write these as UTF-8. So even if you're using
"bytes" as the input encoding for your actual document files, you'd
need to reset to UTF-8 whenever LaTeX reads back an aux file that
it's written.
My own view is that when you adopt xetex features (such as fontspec
for font loading), you should move the source document to Unicode,
and not try to straddle the 8-bit and Unicode worlds. And to run old
documents unchanged, the simplest solution is to use an old engine!
I'd love to see \usepackage[utf8]{inputenc} be enhanced to recognize
xetex and "do nothing" in this case, leaving the engine to handle the
Unicode data. Whether it's worth trying to deal with other inputenc
options as well is more questionable, IMO. But I just haven't had
time to pursue this (and there hasn't been great pressure from users;
people seem happy enough to remove \usepackage{inputenc} from their
xelatex documents).
JK
More information about the XeTeX
mailing list