[texhax] Some puzzling TeX

Stephen Hicks sdh33 at cornell.edu
Mon Feb 21 02:08:10 CET 2011

On Sun, Feb 20, 2011 at 8:13 AM, Uwe Lueck <uwe.lueck at web.de> wrote:
> "Philip Taylor (Webmaster, Ret'd)" <P.Taylor at Rhul.Ac.Uk> wrote 20.02.2011 01:40:24:
>>Uwe Lueck wrote:
>>> "Stephen Hicks"  wrote 17.02.2011 01:50:44:
>>>> catcode 16
>>> What's that?
>> See page 209.
> Yes, thanks, ... perhaps.
> What evidence is there besides this one of Knuth's notoriously unreliable ("incredible") claims?
> I am unable to get a catcode of \relax or \bgroup by \showthe\catcode.
> Only with \ifcat, I can see that \relax and \bgroup are different,
> while, e.g., after \let\BGROUP{\et\EGROUP}, \bgroup and \BGROUP
> are the same according to \ifcat.
> My conclusion at the moment is that one might better say that
> with \ifcat, control sequences behave "as if they had catcode 16 or ..."
> Especially, it seems to me that instead of "catcode 16" one could
> as well speak of "catcode -1" or anything else that is not among
> 0, ..., 15.
> I am almost a Pascal and truely a C illiterate,
> so can't read the code of TeX, that may allow a more specific statement.

You've provoked me to investigate.  While indeed the TeXbook claims
that TeX treats control sequences as catcode 16 and character code
256, the source code suggests otherwise.  Here's a trace through what
happens on \ifcat\noexpand\controlseq:

[Test if two characters match §506] calls get_x_token_or_active_char
twice, comparing the (possibly-modified) value of cur_cmd.  This leads
to get_x_token (§380) which calls get_next (§341).  Here we'll assume
the tokens are coming from a token list rather than an input stream,
so we end up in [Input from token list ... §357], which sets cur_cmd =
no_expand (= 103).  Back in get_x_token (§380), since max_command (=
100) < no_expand < call (= 111), we call expand (§366).  Since cur_cmd
= no_expand < call, we [Expand a nonmacro §367] and ultimately
[Suppress expansion of the next token §369], which calls get_token
(§365), which calls get_next a second time, returning cur_cmd := call
= 111 (assuming  that \controlseq is neither \outer nor \long) and
points cur_cs to the macro's definition.  §369 then backs up the
packed cur_tok into t and calls back_input (§325) to "unread"
\controlseq.  Since cur_tok was a control sequence, we insert a
permanent \notexpanded into the front of the input stream.  We now
return from expand (§366) back to get_x_token (§380) and goto restart.
 The second time through, get_next (§341) sees the \notexpanded, which
sets cur_cmd to dont_expand (§210,258) and jumps to [Get the next
token, suppressing expansion §358], which sets cur_cmd := relax (= 0,
§207) and cur_chr := no_expand_flag because \controlseq, which was
saved in cur_cs, had a cur_cmd > max_command.  Back in get_x_token
(§380) we see cur_cmd = 0 < max_command so we're done.  Back to
get_x_token_or_active_char (§506) - we now find cur_cmd = relax and
cur_chr = no_expand_flag, so we set cur_cmd := active_char (= 13 (!))
and cur_chr := cur_tok - cs_token_flag - active_base.  This seems
strange, but the prose at the top of the section tips us off to the
fact that "active characters have the smallest tokens" and therefore
the check "if (cur_cmd > active_char) or (cur_chr > 255)" does indeed
fire because of the second clause.  Thus, we (effectively) set cur_cmd
= relax = 0, which is compared to the catcode of the next token.

Whew...  so, it looks like this myth of "control sequences have
catcode 16" can finally be put to rest.  The catcode of a control
sequence is, in fact, 0.  But since there's no way to ever get a naked
escape token (catcode 0) into the input stream, the fact that it has
the same catcode is irrelevant.


More information about the texhax mailing list