[XeTeX] xetex file organization
Jonathan Kew
jonathan_kew at sil.org
Wed Nov 3 16:48:35 CET 2004
On 3 Nov 2004, at 3:04 pm, Bruno Voisin wrote:
> All the rest looks fine to me, but again I'm not a specialist. That
> said:
>
>> tex/
>> generic/
>> hyphen/
>> (Unicode-compatible versions of hyphenation files;
>> these are designed to still work with standard TeX as well)
>
> Hopefully at some point in the future, when/if the capability of
> reading files in a specific encoding is added to XeTeX, this directory
> would become unnecessary (as well as the modified version of url.sty
> in texmf.gwtex).
Actually, I've now implemented support for reading files in non-Unicode
encodings (but haven't released a version including this yet). So you
know what's coming, you can say:
\XeTeXinputencoding "encoding-name"
(where "encoding-name" is scanned like a filename by XeTeX, with
optional quotes). The "encoding-name" can be one of a set of built-in
names:
auto (the default setting, auto-detects utf8 or utf16 files)
utf8
utf16 (platform-native utf16, i.e., big-endian on Mac OS X)
utf16be
utf16le
bytes (reads individual bytes directly as character codes 0..255)
or it can be an "internet encoding name" recognized by the Mac OS Text
Encoding Converter; so you can say things like:
\XeTeXinputencoding "x-mac-roman"
\XeTeXinputencoding "windows-1252"
\XeTeXinputencoding "iso-8859-4"
\XeTeXinputencoding "big5"
etc., and the text will be converted from that encoding to Unicode as
the file is read.
The encoding used to read a file (either \input or \openin) is
determined at the time the file is opened; it can't be changed on the
fly.
Note that it may still be necessary to adapt hyphenation files, though,
as many of them are written in terms of specific legacy encodings using
TeX-level mechanisms (active characters, ^^xx sequences, etc.). These
mechanisms won't be affected by the \XeTeXinputencoding setting.
Although such files can safely be read by XeTeX, they may not provide
the appropriate hyphenation rules for text that actually uses the
Unicode character codes for the given language.
In a case like url.sty, yes, I'd guess that simply reading it in Latin1
ought to solve the problems people have had. How best to ensure that
this happens is another question.... I'm still thinking about that.
Jonathan
More information about the XeTeX
mailing list