[XeTeX] ifcat changed?

Bruno Le Floch blflatex at gmail.com
Sun Apr 16 15:19:05 CEST 2017


Filed https://sourceforge.net/p/xetex/bugs/138/ with a text essentially
identical to my message below explaining the bug's origin and how to fix it.

On 04/16/2017 06:50 AM, Julian Bradfield wrote:
> On 2017-04-16, Zdenek Wagner <zdenek.wagner at gmail.com> wrote:
>> 2017-04-16 10:08 GMT+02:00 Julian Bradfield <jcb+xetex at jcbradfield.org>:
> ....
>>> Definitely a bug. The TeXbook defines the behaviour of \if and \ifcat,
>>> and all control sequences are considered to have character code 256
>>> and category code 16, unless \let equal to a non-active character, in
>>> which case they have the value of that character.
>>>
>>> Not all control sequences but primitives. Unlike \ifx, \if and \ifcat
>> perform full expansion.
> 
> (a) Yes, they do perform expansion. That's irrelevant to the point at
>     hand, since expansion happens before the comparison.
> (b) All control sequences, not just primitives:
> 
> \ifcat\noexpand\foo\noexpand\baz true\else false \fi
> 
> \ifcat\noexpand\foo\halign true\else false \fi
> 
> As Philip pointed out, I was reporting Knuth's words, which are by
> definition authoritative.

As far as I can tell from the sources, the bug likely was there from the
start, and only affects \span, \cr and \crcr.  Basically, their
character code is too small.  This can be fixed by changing
"special_char" from 65537 to 1114112 or so, to make the values of
"span_code", "cr_code", "cr_cr_code" be above "biggest_usv".

The test \ifcat and \if use to distinguish control sequences from
normal/active characters is

    (cur_cmd>active_char)or(cur_chr>biggest_usv)

Most tokens that are not character tokens have "cur_cmd" greater than
"active_char".  All exceptions are primitives, among which \relax,
\span, \cr, \crcr.  For these primitives, Knuth made sure that "cur_chr"
was bigger than 255, but some cases were not increased enough when
switching to Unicode in XeTeX.  I think I went through all cases and
only "span_code", "cr_code", "cr_cr_code" need to be changed, although I
think it makes sense to also increase "special_char" (used as a
\noexpand marker).

On a related note, I think "define(p,relax,256)" should be
"define(p,relax,too_big_usv)" but I'm not quite following the code there
so don't trust me.  Namely, I don't see how the XeTeX code ends up
correctly giving TRUE in \chardef\foo=123\ifx\relax\foo TRUE\fi.

Best,

Bruno


More information about the XeTeX mailing list