[tex-k] Does TeX's Input Processor Tokenize The Entire Input File First?

Wed Apr 17 19:46:57 CEST 2019

Jon Forrest wrote:

>| I wonder if Knuth were writing TeX today
>| would he use multi-threading?

Before I wrote my own TeX language interpreter for use in my own projects (long story), I also spent a bunch of time reading TeXByTopic.  It was pretty helpful initially, but then once I got into the thick of my re-implementation (as a simple library in C that can be linked into any other program), only Knuth's WEB source could answer my questions.  It took many years of effort to unravel it all and rewrite all the algorithms from scratch.

Time after time, I thought I could clean this up or do something here that Knuth was doing there.  And nearly every time, I would have to backtrack upon realizing that I would be introducing an incompatibility with the way the original engine operated on source code.

But not every time, and there are ways to generalize that don't violate backward compatibility, although there is always a danger of forward incompatibility when generalizing.  Also, there's a lot more memory and speed to play with these days, which means some of TeX's optimizations can be ignored as causing too much complexity.

FWIW, my recollection is that the one error Knuth has said he regrets making in designing TeX was using binary fixed point instead of decimal fixed point.  In other words, instead of the low-order bit of a signed scaled [16:16] fixed point integer dimension representing 1/32767th of a point, it would have been better to have it represent 1/10000th of a point.  Conversion between binary and decimal in the current engine thus causes more round off error than necessary.  Of course, solving that problem would make everything incompatible, so what's done is done (or what's fixed is done, or what's done is fixed, or ...).

Doug McKenna
Mathemaesthetics, Inc.