[XeTeX] Bug in XeTeX 0.997?
Jonathan Kew
jonathan_kew at sil.org
Wed Jan 16 11:02:12 CET 2008
On 15 Jan 2008, at 8:24 pm, Ross Moore wrote:
> Hi Youssef and Jonathan,
>
> On 15/01/2008, at 11:35 PM, Youssef Jabri wrote:
>
>> Hi Jonathan, Hi everybody,
>>
>> I am preparing a new version of Arabi that works with both eTeX and
>> XeTeX, and things work quite well so far.
>> But I noticed that the following code works with eTeX, XeTeX 0.996
>> but fails with version 0.997
>>
>> \documentclass{article}
>> %utf8 is a part of the ArabTeX package which handles unicode by its
>> own, it's earlier than the utf8 code used by the inputenc package.
>> \usepackage{arabtex,utf8}
>> \begin{document}
>> bla bla
>> \end{document}
>>
>> I am using the mac intel binary from
>> http://minimals.contextgarden.net/current/bin/xetex/
>
> I can confirm this.
> It fails with version 0.997 whereas it works with 0.996 .
I'm a bit surprised it worked in 0.996, actually... I guess the xetex
is being a bit stricter about reading UTF-8 now.
> The error message is:
>
> Runaway definition?
> ->\global \let \a at scan \utfc at scan \global \def \sc at beg {\utf at beg }
> \global \ETC.
> ! File ended while scanning definition of \set at utfc.
>
>
> The actual point of failure is at line 31 in .../arabtex/utf8.sty
>
> \catcode `· 11
>
> This \catcode setting does not work properly and causes the '}'
> at the end of the following line to be not recognised as being
> the end of the replacement tokens for \gdef\set at utfc{...
The problem arises because .../arabtex/utf8.sty is an 8-bit, non-
Unicode file, which xetex tries to interpret as UTF-8. When it sees
the (single byte) code for what's appearing here as a bullet, this is
taken as the first byte of a multi-byte UTF-8 sequence.
>
> If an extra '}' is appended, the definition is completed,
> but not as the author intended; viz.
>
>> \set at utfc=macro:
> ->\global \let \a at scan \utfc at scan \global \def \sc at beg {\utf at beg }
> \global \def
> \sc at word {\utf at word }\global \a at digits = {0123456789}\global \a at first
> = {Ύϕ^^
> 92^^8d}\catcode `\BAD.1 \a at message {input encoding set to UTF-8
> conventi
> ons}}.
> l.35 \show\set at utfc
>
>
> Note the "\catcode `\BAD.1 " and the extra "}" at the end of these
> expansion tokens; whereas with XeTeX v0.996 the correct expansion
> is:
>
>> \set at utfc=macro:
> ->\global \let \a at scan \utfc at scan \global \def \sc at beg {\utf at beg }
> \global \def
> \sc at word {\utf at word }\global \a at digits = {0123456789}\global \a at first
> = {Ύϕ^^
> 92^^8d}\catcode `1 \a at message {input encoding set to UTF-8
> convention
> s}.
> l.35 \show\set at utfc
While that runs without error messages, it is not the
"correct" (intended) meaning of the code. Note the \catcode command,
which is going to set the catcode of some arbitrary character
(showing as a ".notdef" box in my email) to 1, not to 11 as
originally intended. This is because the bytes following the "bullet"
byte were consumed by xetex's UTF-8 interpretation.
>
>
> This problem seems to be by-passed by changing line 31 to read:
> \catcode `\· 11
>
> but then a similar problem occurs at line 1300 in .../arabtex/
> apatch.sty
> which is fixed the same way.
>
>
> These small edits do not adversely affect XeTeX v0.996 either,
> so far as I can tell without actually setting anything in arabic.
> Certainly the packages now load without errors.
While they may load without errors, they are probably not performing
their intended function (which is probably not needed anyway in XeTeX).
To read a file like this "correctly" with xetex, you'd need to set
the input encoding form to "bytes". Then for the UTF-8 macro support
to work as intended, you'd need to do the same with the actual text
files, too. But far better to forget all this and simply allow xetex
to process the UTF-8 text natively.
JK
More information about the XeTeX
mailing list