[tex-k] Is this a bug or just an inconsistency?

Sat Mar 7 00:11:21 CET 2020

TeX has the (unusual) ability to define the control sequence with zero letters in its name, also referred to as the null CS.

In TeX's WEB source code, it pre-allocates a special slot alongside its various CS name tables for this one particular (weird) control sequence, at |null_cs|.

TeX's source and the TeXbook explain that the way to refer to the null CS is with the construction |\csname\endcsname|.  The source code for constructing CS names with these two bracketing primitives expressly returns |null_cs| when there are no characters between them (as opposed to looking up a non-empty name in the hash table or single-character CS table).

And indeed, it works.  Furthermore, the null CS is not frozen.  You can (re)define it as a macro:

\def\csname\endcsname{FOOBAR}

and later invoke it with:

\csname\endcsname

which inserts "FOOBAR" into the layout.  So far so good.

Running under the plain format, 

%%%%%%%%%%%%%%%%%%%%
\endlinechar=-1
foo

bar
\end
%%%%%%%%%%%%%%%%%%%%

inserts the word

   foobar

into the layout without any space between "foo" and "bar", because empty input lines without final line end characters do not represent spacers (or pairs of line end spacers) and so the space between the two words is elided without turning two empty lines into a |\par| command.

And this:

%%%%%%%%%%%%%%%%%%%%
\def\csname\endcsname{FOOBAR}
\endlinechar=-1
foo

\csname\endcsname

bar
\end
%%%%%%%%%%%%%%%%%%%%

does the expected, which is to insert

   fooFOOBARbar

into the layout.  The point being that the null CS has definitely and legally been defined to mean something when expanded.  It is now just another macro that expands to its definition when invoked.  Or it should be.

But if you look through TeX's source code, there is another spot where TeX's scanner returns the null CS, no differently from using |\csname\endcsname| (indeed, the TeXbook on page 47 expressly says they are the same).

While attempting to scan the letters after an escape character, if the scanner discovers there are none left in the current line, it returns |null_cs| (see line 7403 or thereabouts in tex.web).  This can only happen (I believe) when an escape character is the last character on an input line AND |\endlinechar=-1| has been set (disabled).  If not disabled, end-of-line characters are guaranteed to end every line, and so a final escape is then a single-character escape sequence: \^^M or whatever, even though to the user it looks like the escape is the last character of the line.  The code below demonstrates this otherwise invisible distinction.

So my question (and or demonstration of a bug or inconsistency) is that the following code

%%%%%%%%%%%%%%%%%%%%
\def\csname\endcsname{FOOBAR}
\def\
{NEWLINE}
\endlinechar=-1
foo

\csname\endcsname

\meaning\^^M
\meaning\

bar

\

\end
%%%%%%%%%%%%%%%%%%%%

inserts

   fooFOOBARmacro:->NEWLINEundefinedbar

instead of the expected

   fooFOOBARmacro:->NEWLINEmacro:->FOOBARbar

into the layout, and then issues an "Undefined control sequence" error message for the line ending with the lone "\".

How can this be, when the null CS, which has a dedicated and single spot for storing its definition, is not only defined and has a meaning, but just got successfully recognized as such and expanded?

If you comment out the |\endlinechar=-1| line in the above code snippet, there's no error and the layout has the following in it:

   foo
   FOOBAR
   macro:->NEWLINE macro:->NEWLINE bar
   NEWLINE

which makes sense.

Looking at TeX's source, I'm having trouble figuring out why the null CS's meaning is being treated as the |undefined_cs| primitive in this no-line-end case, thereby triggering the error message.  I'm thus unsure where in the source code the bug/inconsistency or my misunderstanding is arising.

Carpe Diem (CS the day),

Doug McKenna
Mathemaesthetics, Inc.