[tex-k] Bug-report for the TeXbook: Not all non-primitive control-sequences are defined, ultimately, in terms of the primitive ones.

Tue Dec 13 10:25:45 CET 2022

> However, I don't believe that a change is warranted.

If it were my book, I'd change it, but it's not and somehow I don't think it's going to be changed.  As I mentioned before, \empty and \space don't contain control sequences, so they aren't defined in terms of primitives (see below).

> Note that (from plain.tex):
> 
> 	\def\'#1{{\accent19 #1}}
> 	\def\"#1{{\accent"7F #1}}
> 
> so the characters 1, 9, 7, F, etc. occurring in the definitions of \' and \"
> are excluded from consideration.

As long as we're being pedantic, I don't think they're excluded from consideration as such:  On page 268 of _The TeXbook_, Knuth states:

"Some commands have arguments. [...] the next tokens are swept up as part of the operation, because TeX needs to know what \dimen register is to be set equal to what <dimen> value."

I've taken the opportunity to review how TeX performs tokenization and how Knuth uses the term "primitive".  It appears that he only refers to control sequences as primitives.  In addition, tokenization is performed differently from the way I thought it was;  I thought it was more like the way it's done in METAFONT, which is somewhat different and which I'm more familiar with.  In TeX, the tokens are at a lower level, namely, except for control sequences, at the level of individual characters.  In METAFONT, adjacent characters of the same category from the input are "collected" to create a token.  I believe Knuth calls the result "tokens" but I'm not sure and it doesn't really matter.  It's really six-of-one, half-a-dozen of the other how the tokenization is performed, and the way TeX and METAFONT perform scanning and parsing isn't completely analogous to the more conventional way of performing these tasks using lex and yacc or Flex and GNU Bison or any of the common tools for writing compilers.

For what it's worth, the way METAFONT (and presumably TeX) performs scanning is very clever and only requires one character of "lookahead".  It's also very simple to implement.  This file contains the definitions of two functions, `yylex' and `sub_yylex', that implement scanning according to MF's rules (with one exception):  https://git.savannah.gnu.org/cgit/3dldf.git/tree/src/scan.web

The exception is that I broke MF's rules in order to implement the operations +=, -=, *= and /=.  However, it was perfectly easy and didn't cause any problems.

It's probably easier to understand this code than the code in _METAFONT:  The Program_ or _TEX:  The Program_, which would be a real challenge (I've never done it myself).

> Gesendet: Dienstag, 13. Dezember 2022 um 09:13 Uhr
> Von: "Paul Vojta" <vojta at math.berkeley.edu>
> An: ud.usenetcorrespondence at web.de
> Cc: tex-k at tug.org
> Betreff: Re: [tex-k] Bug-report for the TeXbook: Not all non-primitive control-sequences are defined, ultimately, in terms of the primitive ones.
>
> On Sun, Dec 11, 2022 at 03:27:23PM +0100, ud.usenetcorrespondence at web.de wrote:
> > Laurence.Finston at gmx.net wrote.
> > 
> > > I am quite certain that the use of \def, \edef, \xdef or \let was not meant,
> > > but rather that the expansion of the macros was meant.  The passage is a
> > > restatement of how macro languages in general work:  tokens are expanded
> > > until they can't be expanded anymore and the remaining primitives or "terminal
> > > symbols" are then passed to the compiler.
> > > Ulrich is correct, primitives are not only control sequences, they are also
> > > plain characters, the beginning-of-group and end-of-group tokens, the
> > > enter-math-mode and exit-math-mode characters, etc.  It's another way of
> > > putting what he otherwise describes as TeX's "mouth" and "stomach" (perhaps
> > > not his most appetizing image) --- what other people call  scanning and parsing.
> 
> [snip]
> 
> I believe that the sentence in question might have been written more
> precisely as:
> 
> 	All other control sequences are defined, ultimately, such that
> 	the only control sequences are the primitive ones.
> 
> Or, more pedantically, as:
> 
> 	All other control sequences ultimately expand to a token string
> 	in which the only control sequences are the primitive ones.
> 
> However, I don't believe that a change is warranted.
> 
> Chapter 3 is all about control sequences, so "in terms of" is implicitly
> restricted to control sequences.  This is supported by (and communicated by)
> the last sentence:
> 
> 	For example, \input is a primitive operation, but \’ and \" are not;
> 	the latter are defined in terms of an \accent primitive.
> 
> Note that (from plain.tex):
> 
> 	\def\'#1{{\accent19 #1}}
> 	\def\"#1{{\accent"7F #1}}
> 
> so the characters 1, 9, 7, F, etc. occurring in the definitions of \' and \"
> are excluded from consideration.
> 
> Sincerely,
> 
> 
> Paul Vojta
>