[tex-k] Bug-report for the TeXbook: Not all non-primitive control-sequences are defined, ultimately, in terms of the primitive ones.

Paul Vojta vojta at math.berkeley.edu
Wed Dec 14 01:39:38 CET 2022


> If it were my book, I'd change it, but it's not and somehow I don't think it's going to be changed.  As I mentioned before, \empty and \space don't contain control sequences, so they aren't defined in terms of primitives (see below).

So then it is true that \empty "expands to a sequence of tokens in which the
only control sequences are the primitive ones" (in this case the assertion
is vacuously true).


> > Note that (from plain.tex):
> > 
> > 	\def\'#1{{\accent19 #1}}
> > 	\def\"#1{{\accent"7F #1}}
> > 
> > so the characters 1, 9, 7, F, etc. occurring in the definitions of \' and \"
> > are excluded from consideration.
> 
> As long as we're being pedantic, I don't think they're excluded from consideration as such:  On page 268 of _The TeXbook_, Knuth states:
> 
> "Some commands have arguments. [...] the next tokens are swept up as part of the operation, because TeX needs to know what \dimen register is to be set equal to what <dimen> value."

Yes, but on page 267 Knuth says, ``We shall study [in Chapters 24-26]
TeX's digestive processes, i.e., what TeX does with the lists of tokens
that arrive in its `stomach.' ''

However, in Chapter 3, Knuth is talking about macro expansion, which happens
in the ``gullet'' (further described in Chapter 20).  So, 1, 9, 7, F, etc.
definitely do travel down the esophagus.  Note also that the sentence in
Chapter 3 doesn't say that every control sequence is defined in terms of
boxes and glue.  (If you're going to posit that \accent is being executed,
then you should also posit that \hbox and \vbox are also being executed.)

Also, note the first paragraph of Chapter 24:  ``... Now it is time to
take a more systematic look at what we have encountered: to consider the
facts in an orderly manner, rather than to mix them up with informal examples
and applications as we have been doing.  A child learns to speak a language
before learning formal rules of grammar, but the rules of grammar come in
handy later on when the child reaches adulthood.''

So, basically, Knuth is saying here that you shouldn't be that pedantic
about statements in early chapters of the book.

--Paul Vojta


On Tue, Dec 13, 2022 at 10:25:45AM +0100, Laurence.Finston at gmx.net wrote:
> > However, I don't believe that a change is warranted.
> 
> If it were my book, I'd change it, but it's not and somehow I don't think it's going to be changed.  As I mentioned before, \empty and \space don't contain control sequences, so they aren't defined in terms of primitives (see below).
> 
> > Note that (from plain.tex):
> > 
> > 	\def\'#1{{\accent19 #1}}
> > 	\def\"#1{{\accent"7F #1}}
> > 
> > so the characters 1, 9, 7, F, etc. occurring in the definitions of \' and \"
> > are excluded from consideration.
> 
> As long as we're being pedantic, I don't think they're excluded from consideration as such:  On page 268 of _The TeXbook_, Knuth states:
> 
> "Some commands have arguments. [...] the next tokens are swept up as part of the operation, because TeX needs to know what \dimen register is to be set equal to what <dimen> value."
> 
> I've taken the opportunity to review how TeX performs tokenization and how Knuth uses the term "primitive".  It appears that he only refers to control sequences as primitives.  In addition, tokenization is performed differently from the way I thought it was;  I thought it was more like the way it's done in METAFONT, which is somewhat different and which I'm more familiar with.  In TeX, the tokens are at a lower level, namely, except for control sequences, at the level of individual characters.  In METAFONT, adjacent characters of the same category from the input are "collected" to create a token.  I believe Knuth calls the result "tokens" but I'm not sure and it doesn't really matter.  It's really six-of-one, half-a-dozen of the other how the tokenization is performed, and the way TeX and METAFONT perform scanning and parsing isn't completely analogous to the more conventional way of performing these tasks using lex and yacc or Flex and GNU Bison or any of the common tools f!
>  or writing compilers.
> 
> For what it's worth, the way METAFONT (and presumably TeX) performs scanning is very clever and only requires one character of "lookahead".  It's also very simple to implement.  This file contains the definitions of two functions, `yylex' and `sub_yylex', that implement scanning according to MF's rules (with one exception):  https://git.savannah.gnu.org/cgit/3dldf.git/tree/src/scan.web
> 
> The exception is that I broke MF's rules in order to implement the operations +=, -=, *= and /=.  However, it was perfectly easy and didn't cause any problems.
> 
> It's probably easier to understand this code than the code in _METAFONT:  The Program_ or _TEX:  The Program_, which would be a real challenge (I've never done it myself).
> 
> 
> > Gesendet: Dienstag, 13. Dezember 2022 um 09:13 Uhr
> > Von: "Paul Vojta" <vojta at math.berkeley.edu>
> > An: ud.usenetcorrespondence at web.de
> > Cc: tex-k at tug.org
> > Betreff: Re: [tex-k] Bug-report for the TeXbook: Not all non-primitive control-sequences are defined, ultimately, in terms of the primitive ones.
> >
> > On Sun, Dec 11, 2022 at 03:27:23PM +0100, ud.usenetcorrespondence at web.de wrote:
> > > Laurence.Finston at gmx.net wrote.
> > > 
> > > > I am quite certain that the use of \def, \edef, \xdef or \let was not meant,
> > > > but rather that the expansion of the macros was meant.  The passage is a
> > > > restatement of how macro languages in general work:  tokens are expanded
> > > > until they can't be expanded anymore and the remaining primitives or "terminal
> > > > symbols" are then passed to the compiler.
> > > > Ulrich is correct, primitives are not only control sequences, they are also
> > > > plain characters, the beginning-of-group and end-of-group tokens, the
> > > > enter-math-mode and exit-math-mode characters, etc.  It's another way of
> > > > putting what he otherwise describes as TeX's "mouth" and "stomach" (perhaps
> > > > not his most appetizing image) --- what other people call  scanning and parsing.
> > 
> > [snip]
> > 
> > I believe that the sentence in question might have been written more
> > precisely as:
> > 
> > 	All other control sequences are defined, ultimately, such that
> > 	the only control sequences are the primitive ones.
> > 
> > Or, more pedantically, as:
> > 
> > 	All other control sequences ultimately expand to a token string
> > 	in which the only control sequences are the primitive ones.
> > 
> > However, I don't believe that a change is warranted.
> > 
> > Chapter 3 is all about control sequences, so "in terms of" is implicitly
> > restricted to control sequences.  This is supported by (and communicated by)
> > the last sentence:
> > 
> > 	For example, \input is a primitive operation, but \’ and \" are not;
> > 	the latter are defined in terms of an \accent primitive.
> > 
> > Note that (from plain.tex):
> > 
> > 	\def\'#1{{\accent19 #1}}
> > 	\def\"#1{{\accent"7F #1}}
> > 
> > so the characters 1, 9, 7, F, etc. occurring in the definitions of \' and \"
> > are excluded from consideration.
> > 
> > Sincerely,
> > 
> > 
> > Paul Vojta
> >
> 



More information about the tex-k mailing list.