% Copyright (c) 2007 Hartmut Henkel, Taco Hoekwater % This file is part of luaTeX. % luaTeX is free software; you can redistribute it and/or modify it under the % terms of the GNU General Public License as published by the Free Software % Foundation; either version 2 of the License, or (at your option) any later % version. % luaTeX is distributed in the hope that it will be useful, but WITHOUT ANY % WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR % A PARTICULAR PURPOSE. See the GNU General Public License for more details. % You should have received a copy of the GNU General Public License % along with luaTeX; if not, write to the Free Software Foundation, Inc., % 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA % pdfTeX is copyright (C) 1996-2006 Han The Thanh, . % e-TeX is copyright (C) 1994,98 by Peter Breitenlohner. % This program is directly derived from Donald E. Knuth's TeX; % the change history which follows and the reward offered for finders of % bugs refer specifically to TeX; they should not be taken as referring % to e-TeX, although the change history is relevant in that it % demonstrates the evolutionary path followed. This program is not TeX; % that name is reserved strictly for the program which is the creation % and sole responsibility of Professor Knuth. % Version 0 was released in September 1982 after it passed a variety of tests. % Version 1 was released in November 1983 after thorough testing. % Version 1.1 fixed ``disappearing font identifiers'' et alia (July 1984). % Version 1.2 allowed `0' in response to an error, et alia (October 1984). % Version 1.3 made memory allocation more flexible and local (November 1984). % Version 1.4 fixed accents right after line breaks, et alia (April 1985). % Version 1.5 fixed \the\toks after other expansion in \edefs (August 1985). % Version 2.0 (almost identical to 1.5) corresponds to "Volume B" (April 1986). % Version 2.1 corrected anomalies in discretionary breaks (January 1987). % Version 2.2 corrected "(Please type...)" with null \endlinechar (April 1987). % Version 2.3 avoided incomplete page in premature termination (August 1987). % Version 2.4 fixed \noaligned rules in indented displays (August 1987). % Version 2.5 saved cur_order when expanding tokens (September 1987). % Version 2.6 added 10sp slop when shipping leaders (November 1987). % Version 2.7 improved rounding of negative-width characters (November 1987). % Version 2.8 fixed weird bug if no \patterns are used (December 1987). % Version 2.9 made \csname\endcsname's "relax" local (December 1987). % Version 2.91 fixed \outer\def\a0{}\a\a bug (April 1988). % Version 2.92 fixed \patterns, also file names with complex macros (May 1988). % Version 2.93 fixed negative halving in allocator when mem_min<0 (June 1988). % Version 2.94 kept open_log_file from calling fatal_error (November 1988). % Version 2.95 solved that problem a better way (December 1988). % Version 2.96 corrected bug in "Infinite shrinkage" recovery (January 1989). % Version 2.97 corrected blunder in creating 2.95 (February 1989). % Version 2.98 omitted save_for_after at outer level (March 1989). % Version 2.99 caught $$\begingroup\halign..$$ (June 1989). % Version 2.991 caught .5\ifdim.6... (June 1989). % Version 2.992 introduced major changes for 8-bit extensions (September 1989). % Version 2.993 fixed a save_stack synchronization bug et alia (December 1989). % Version 3.0 fixed unusual displays; was more \output robust (March 1990). % Version 3.1 fixed nullfont, disabled \write{\the\prevgraf} (September 1990). % Version 3.14 fixed unprintable font names and corrected typos (March 1991). % Version 3.141 more of same; reconstituted ligatures better (March 1992). % Version 3.1415 preserved nonexplicit kerns, tidied up (February 1993). % Version 3.14159 allowed fontmemsize to change; bulletproofing (March 1995). % Version 3.141592 fixed \xleaders, glueset, weird alignments (December 2002). % Although considerable effort has been expended to make the luaTeX program % correct and reliable, no warranty is implied; the authors disclaim any % obligation or liability for damages, including but not limited to % special, indirect, or consequential damages arising out of or in % connection with the use or performance of this software. This work has % been a ``labor of love'' and the authors hope that users enjoy it. % Here is TeX material that gets inserted after \input webmac \def\hang{\hangindent 3em\noindent\ignorespaces} \def\hangg#1 {\hang\hbox{#1 }} \def\textindent#1{\hangindent2.5em\noindent\hbox to2.5em{\hss#1 }\ignorespaces} \font\ninerm=cmr9 \let\mc=\ninerm % medium caps for names like SAIL \def\PASCAL{Pascal} \def\pdfTeX{pdf\TeX} \def\pdfeTeX{pdf\eTeX} \def\PDF{PDF} \def\Aleph{Aleph} \def\eTeX{e\TeX} \def\LuaTeX{Lua\TeX} \def\THANH{H\`an Th\^e\llap{\raise 0.5ex\hbox{\'{}}} Th\`anh} \def\ph{\hbox{Pascal-H}} \def\pct!{{\char`\%}} % percent sign in ordinary text \def\grp{\.{\char'173...\char'175}} \font\logo=logo10 % font used for the METAFONT logo \def\MF{{\logo META}\-{\logo FONT}} \def\<#1>{$\langle#1\rangle$} \def\section{\mathhexbox278} \def\(#1){} % this is used to make section names sort themselves better \def\9#1{} % this is used for sort keys in the index via @@:sort key}{entry@@> \outer\def\N#1. \[#2]#3.{\MN#1.\vfil\eject % begin starred section \def\rhead{PART #2:\uppercase{#3}} % define running headline \message{*\modno} % progress report \edef\next{\write\cont{\Z{\?#2]#3}{\modno}{\the\pageno}}}\next \ifon\startsection{\bf\ignorespaces#3.\quad}\ignorespaces} \let\?=\relax % we want to be able to \write a \? \def\title{LuaTeX} \let\maybe=\iffalse % print only changed modules \def\topofcontents{\hsize 5.5in \vglue 0pt plus 1fil minus 1.5in \def\?##1]{\hbox to 1in{\hfil##1.\ }} } \def\botofcontents{\vskip 0pt plus 1fil minus 1.5in} \pageno=3 \def\glob{13} % this should be the section number of "" \def\gglob{20, 26} % this should be the next two sections of "" @* \[1] Introduction. This is LuaTeX, a continuation of $\pdfTeX$ and $\Aleph$. LuaTeX is a document compiler intended to simplify high-quality typesetting for many of the world's languages. It is an extension of D. E. Knuth's \TeX, which was designed essentially for the typesetting of languages using the Latin alphabet. The $\Aleph$ subsystem loosens many of the restrictions imposed by~\TeX: register numbers are no longer limited to 8~bits; fonts may have more than 256~characters; more than 256~fonts may be used; etc. The \PASCAL\ program that follows is the definition of \TeX82, a standard @:PASCAL}{\PASCAL@> @!@:TeX82}{\TeX82@> version of \TeX\ that is designed to be highly portable so that identical output will be obtainable on a great variety of computers. The main purpose of the following program is to explain the algorithms of \TeX\ as clearly as possible. As a result, the program will not necessarily be very efficient when a particular \PASCAL\ compiler has translated it into a particular machine language. However, the program has been written so that it can be tuned to run efficiently in a wide variety of operating environments by making comparatively few changes. Such flexibility is possible because the documentation that follows is written in the \.{WEB} language, which is at a higher level than \PASCAL; the preprocessing step that converts \.{WEB} to \PASCAL\ is able to introduce most of the necessary refinements. Semi-automatic translation to other languages is also feasible, because the program below does not make extensive use of features that are peculiar to \PASCAL. A large piece of software like \TeX\ has inherent complexity that cannot be reduced below a certain level of difficulty, although each individual part is fairly simple by itself. The \.{WEB} language is intended to make the algorithms as readable as possible, by reflecting the way the individual program pieces fit together and by providing the cross-references that connect different parts. Detailed comments about what is going on, and about why things were done in certain ways, have been liberally sprinkled throughout the program. These comments explain features of the implementation, but they rarely attempt to explain the \TeX\ language itself, since the reader is supposed to be familiar with {\sl The \TeX book}. @.WEB@> @:TeXbook}{\sl The \TeX book@> @ The present implementation has a long ancestry, beginning in the summer of~1977, when Michael~F. Plass and Frank~M. Liang designed and coded a prototype @^Plass, Michael Frederick@> @^Liang, Franklin Mark@> @^Knuth, Donald Ervin@> based on some specifications that the author had made in May of that year. This original proto\TeX\ included macro definitions and elementary manipulations on boxes and glue, but it did not have line-breaking, page-breaking, mathematical formulas, alignment routines, error recovery, or the present semantic nest; furthermore, it used character lists instead of token lists, so that a control sequence like \.{\\halign} was represented by a list of seven characters. A complete version of \TeX\ was designed and coded by the author in late 1977 and early 1978; that program, like its prototype, was written in the {\mc SAIL} language, for which an excellent debugging system was available. Preliminary plans to convert the {\mc SAIL} code into a form somewhat like the present ``web'' were developed by Luis Trabb~Pardo and the author at the beginning of 1979, and a complete implementation was created by Ignacio~A. Zabala in 1979 and 1980. The \TeX82 program, which @^Zabala Salelles, Ignacio Andr\'es@> was written by the author during the latter part of 1981 and the early part of 1982, also incorporates ideas from the 1979 implementation of @^Guibas, Leonidas Ioannis@> @^Sedgewick, Robert@> @^Wyatt, Douglas Kirk@> \TeX\ in {\mc MESA} that was written by Leonidas Guibas, Robert Sedgewick, and Douglas Wyatt at the Xerox Palo Alto Research Center. Several hundred refinements were introduced into \TeX82 based on the experiences gained with the original implementations, so that essentially every part of the system has been substantially improved. After the appearance of ``Version 0'' in September 1982, this program benefited greatly from the comments of many other people, notably David~R. Fuchs and Howard~W. Trickey. A final revision in September 1989 extended the input character set to eight-bit codes and introduced the ability to hyphenate words from different languages, based on some ideas of Michael~J. Ferguson. @^Fuchs, David Raymond@> @^Trickey, Howard Wellington@> @^Ferguson, Michael John@> No doubt there still is plenty of room for improvement, but the author is firmly committed to keeping \TeX82 ``frozen'' from now on; stability and reliability are to be its main virtues. On the other hand, the \.{WEB} description can be extended without changing the core of \TeX82 itself, and the program has been designed so that such extensions are not extremely difficult to make. The |banner| string defined here should be changed whenever \TeX\ undergoes any modifications, so that it will be clear which version of \TeX\ might be the guilty party when a problem arises. @^extensions to \TeX@> @^system dependencies@> This program contains code for various features extending \TeX, therefore this program is called `\eTeX' and not `\TeX'; the official name `\TeX' by itself is reserved for software systems that are fully compatible with each other. A special test suite called the ``\.{TRIP} test'' is available for helping to determine whether a particular implementation deserves to be known as `\TeX' [cf.~Stanford Computer Science report CS1027, November 1984]. A similar test suite called the ``\.{e-TRIP} test'' is available for helping to determine whether a particular implementation deserves to be known as `\eTeX'. @d eTeX_version_string=="2.2" {current \eTeX\ version} @d Omega_version_string=="1.15" { \.{\\OmegaVersion} } @d Aleph_version_string=="0.0" { \.{\\AlephVersion} } @d eTeX_version=2 { \.{\\eTeXversion} } @d Omega_version=1 { \.{\\Omegaversion} } @d Aleph_version=0 { \.{\\Alephversion} } @d eTeX_minor_version=1 { \.{\\eTeXminorversion} } @d Omega_minor_version=15 { \.{\\Omegaminorversion} } @d Aleph_minor_version=0 { \.{\\Alephminorversion} } @d eTeX_revision==".2" { \.{\\eTeXrevision} } @d Omega_revision==".15" { \.{\\Omegarevision} } @d Aleph_revision==".0" { \.{\\Alephrevision} } @# @d pdftex_version==200 { \.{\\pdftexversion} } @d pdftex_revision=="0" { \.{\\pdftexrevision} } @d pdftex_version_string=='-2.00.0' {current \pdfTeX\ version} @# @d luatex_version==25 { \.{\\luatexversion} } @d luatex_revision=="4" { \.{\\luatexrevision} } @d luatex_version_string=='snapshot-0.25.4' {current \LuaTeX\ version} @d luatex_date_info==-extra_version_info { the compile date is negated } @# @d luaTeX_banner=='This is LuaTeX, Version ',luatex_version_string,extra_version_info {printed when \LuaTeX\ starts} @# @d banner==luaTeX_banner @# @d TEX==ETEX {change program name into |ETEX|} @ Different \PASCAL s have slightly different conventions, and the present @!@:PASCAL H}{\ph@> program expresses \TeX\ in terms of the \PASCAL\ that was available to the author in 1982. Constructions that apply to this particular compiler, which we shall call \ph, should help the reader see how to make an appropriate interface for other systems if necessary. (\ph\ is Charles Hedrick's modification of a compiler @^Hedrick, Charles Locke@> for the DECsystem-10 that was originally developed at the University of Hamburg; cf.\ {\sl SOFTWARE---Practice \AM\ Experience \bf6} (1976), 29--42. The \TeX\ program below is intended to be adaptable, without extensive changes, to most other versions of \PASCAL, so it does not fully use the admirable features of \ph. Indeed, a conscious effort has been made here to avoid using several idiosyncratic features of standard \PASCAL\ itself, so that most of the code can be translated mechanically into other high-level languages. For example, the `\&{with}' and `\\{new}' features are not used, nor are pointer types, set types, or enumerated scalar types; there are no `\&{var}' parameters, except in the case of files --- \eTeX, however, does use `\&{var}' parameters for the |reverse| function; there are no tag fields on variant records; there are no assignments |real:=integer|; no procedures are declared local to other procedures.) The portions of this program that involve system-dependent code, where changes might be necessary because of differences between \PASCAL\ compilers and/or differences between operating systems, can be identified by looking at the sections whose numbers are listed under `system dependencies' in the index. Furthermore, the index entries for `dirty \PASCAL' list all places where the restrictions of \PASCAL\ have not been followed perfectly, for one reason or another. @!@^system dependencies@> @!@^dirty \PASCAL@> Incidentally, \PASCAL's standard |round| function can be problematical, because it disagrees with the IEEE floating-point standard. Many implementors have therefore chosen to substitute their own home-grown rounding procedure. @ The program begins with a normal \PASCAL\ program heading, whose components will be filled in later, using the conventions of \.{WEB}. @.WEB@> For example, the portion of the program called `\X\glob:Global variables\X' below will be replaced by a sequence of variable declarations that starts in $\section\glob$ of this documentation. In this way, we are able to define each individual global variable when we are prepared to understand what it means; we do not have to define all of the globals at once. Cross references in $\section\glob$, where it says ``See also sections \gglob, \dots,'' also make it possible to look at the set of all global variables, if desired. Similar remarks apply to the other portions of the program heading. Actually the heading shown here is not quite normal: The |program| line does not mention any |output| file, because \ph\ would ask the \TeX\ user to specify a file name if |output| were specified here. @^system dependencies@> @d mtype==t@&y@&p@&e {this is a \.{WEB} coding trick:} @f mtype==type {`\&{mtype}' will be equivalent to `\&{type}'} @f type==true {but `|type|' will not be treated as a reserved word} @p @t\4@>@@/ program TEX; {all file names are defined dynamically} label @@/ const @@/ mtype @@/ var @@/ @# procedure initialize; {this procedure gets things started properly} var @@/ begin @@; end;@# @t\4@>@@/ @t\4@>@@/ @ The overall \TeX\ program begins with the heading just shown, after which comes a bunch of procedure declarations and function declarations. Finally we will get to the main program, which begins with the comment `|start_here|'. If you want to skip down to the main program now, you can look up `|start_here|' in the index. But the author suggests that the best way to understand this program is to follow pretty much the order of \TeX's components as they appear in the \.{WEB} description you are now reading, since the present ordering is intended to combine the advantages of the ``bottom up'' and ``top down'' approaches to the problem of understanding a somewhat complicated system. @ Three labels must be declared in the main program, so we give them symbolic names. @d start_of_TEX=1 {go here when \TeX's variables are initialized} @d end_of_TEX=9998 {go here to close files and terminate gracefully} @d final_end=9999 {this label marks the ending of the program} @= start_of_TEX@t\hskip-2pt@>, end_of_TEX@t\hskip-2pt@>,@,final_end; {key control points} @ Some of the code below is intended to be used only when diagnosing the strange behavior that sometimes occurs when \TeX\ is being installed or when system wizards are fooling around with \TeX\ without quite knowing what they are doing. Such code will not normally be compiled; it is delimited by the codewords `$|debug|\ldots|gubed|$', with apologies to people who wish to preserve the purity of English. Similarly, there is some conditional code delimited by `$|stat|\ldots|tats|$' that is intended for use when statistics are to be kept about \TeX's memory usage. The |stat| $\ldots$ |tats| code also implements diagnostic information for \.{\\tracingparagraphs} and \.{\\tracingpages}. @^debugging@> @d debug==@{ {change this to `$\\{debug}\equiv\null$' when debugging} @d gubed==@t@>@} {change this to `$\\{gubed}\equiv\null$' when debugging} @f debug==begin @f gubed==end @# @d stat==@{ {change this to `$\\{stat}\equiv\null$' when gathering usage statistics} @d tats==@t@>@} {change this to `$\\{tats}\equiv\null$' when gathering usage statistics} @f stat==begin @f tats==end @ This program has two important variations: (1) There is a long and slow version called \.{INITEX}, which does the extra calculations needed to @.INITEX@> initialize \TeX's internal tables; and (2)~there is a shorter and faster production version, which cuts the initialization to a bare minimum. Parts of the program that are needed in (1) but not in (2) are delimited by the codewords `$|init|\ldots|tini|$'. @d init== {change this to `$\\{init}\equiv\.{@@\{}$' in the production version} @d tini== {change this to `$\\{tini}\equiv\.{@@\}}$' in the production version} @f init==begin @f tini==end @= @@/ @!init @@;@+tini @ If the first character of a \PASCAL\ comment is a dollar sign, \ph\ treats the comment as a list of ``compiler directives'' that will affect the translation of this program into machine language. The directives shown below specify full checking and inclusion of the \PASCAL\ debugger when \TeX\ is being debugged, but they cause range checking and other redundant code to be eliminated when the production system is being generated. Arithmetic overflow will be detected in all cases. @^system dependencies@> @^Overflow in arithmetic@> @= @{@&$C-,A+,D-@} {no range check, catch arithmetic overflow, no debug overhead} @!debug @{@&$C+,D+@}@+ gubed {but turn everything on when debugging} @ This \TeX\ implementation conforms to the rules of the {\sl Pascal User @:PASCAL}{\PASCAL@> @^system dependencies@> Manual} published by Jensen and Wirth in 1975, except where system-dependent @^Wirth, Niklaus@> @^Jensen, Kathleen@> code is necessary to make a useful system program, and except in another respect where such conformity would unnecessarily obscure the meaning and clutter up the code: We assume that |case| statements may include a default case that applies if no matching label is found. Thus, we shall use constructions like $$\vbox{\halign{\ignorespaces#\hfil\cr |case x of|\cr 1: $\langle\,$code for $x=1\,\rangle$;\cr 3: $\langle\,$code for $x=3\,\rangle$;\cr |othercases| $\langle\,$code for |x<>1| and |x<>3|$\,\rangle$\cr |endcases|\cr}}$$ since most \PASCAL\ compilers have plugged this hole in the language by incorporating some sort of default mechanism. For example, the \ph\ compiler allows `|others|:' as a default label, and other \PASCAL s allow syntaxes like `\&{else}' or `\&{otherwise}' or `\\{otherwise}:', etc. The definitions of |othercases| and |endcases| should be changed to agree with local conventions. Note that no semicolon appears before |endcases| in this program, so the definition of |endcases| should include a semicolon if the compiler wants one. (Of course, if no default mechanism is available, the |case| statements of \TeX\ will have to be laboriously extended by listing all remaining cases. People who are stuck with such \PASCAL s have, in fact, done this, successfully but not happily!) @d othercases == others: {default for cases not listed explicitly} @d endcases == @+end {follows the default case in an extended |case| statement} @f othercases == else @f endcases == end @ The following parameters can be changed at compile time to extend or reduce \TeX's capacity. They may have different values in \.{INITEX} and in production versions of \TeX. @.INITEX@> @^system dependencies@> @= @!buf_size=500; {maximum number of characters simultaneously present in current lines of open files and in control sequences between \.{\\csname} and \.{\\endcsname}; must not exceed |max_halfword|} @!error_line=72; {width of context lines on terminal error messages} @!half_error_line=42; {width of first lines of contexts in terminal error messages; should be between 30 and |error_line-15|} @!max_print_line=79; {width of longest text lines output; should be at least 60} @!stack_size=200; {maximum number of simultaneous input sources} @!max_in_open=6; {maximum number of input files and error insertions that can be going on simultaneously} @!param_size=60; {maximum number of simultaneous macro parameters} @!nest_size=40; {maximum number of semantic levels simultaneously active} @!max_strings=3000; {maximum number of strings; must not exceed |max_halfword|} @!string_vacancies=8000; {the minimum number of characters that should be available for the user's control sequences and font names, after \TeX's own error messages are stored} @!pool_size=32000; {maximum number of characters in strings, including all error messages and help texts, and the names of all fonts and control sequences; must exceed |string_vacancies| by the total length of \TeX's own strings, which is currently about 23000} @!save_size=600; {space for saving values outside of current group; must be at most |max_halfword|} @!dvi_buf_size=800; {size of the output buffer; must be a multiple of 8} @!expand_depth=10000; {limits recursive calls of the |expand| procedure} @!file_name_size=40; {file names shouldn't be longer than this} @!pool_name='TeXformats:TEX.POOL '; {string of length |file_name_size|; tells where the string pool appears} @.TeXformats@> @!active_mem_size=50000; {number of words of |active_info| for active ocps} @!ocp_maxint=@"10000000; @ Like the preceding parameters, the following quantities can be changed at compile time to extend or reduce \TeX's capacity. But if they are changed, it is necessary to rerun the initialization program \.{INITEX} @.INITEX@> to generate new tables for the production \TeX\ program. One can't simply make helter-skelter changes to the following constants, since certain rather complex initialization numbers are computed from them. They are defined here using \.{WEB} macros, instead of being put into \PASCAL's |const| list, in order to emphasize this distinction. @d font_base=0 {smallest internal font number; must not be less than |min_quarterword|} @d hash_size=2100 {maximum number of control sequences; it should be at most about |(fix_mem_max-fix_mem_min)/10|} @d hash_prime=1777 {a prime number equal to about 85\pct! of |hash_size|} @d ocp_base=0 {smallest internal ocp number; must not be less than |min_quarterword|} @d ocp_biggest=32767 {the real biggest ocp} @d number_ocps=ocp_biggest-ocp_base+1 @d ocp_list_base=0 {smallest internal ocp list number; must not be less than |min_quarterword|} @d ocp_list_biggest=32727 {the real biggest ocp list} @d number_ocp_lists=ocp_list_biggest-ocp_list_base+1 @d max_active_ocp_lists=32768 @d biggest_reg=65535 {the largest allowed register number; must be |<=max_quarterword|} @d number_regs=65536 {|biggest_reg+1|} @d number_attrs=65536 {total numbeer of attributes} @d biggest_mark=65535 {the largest allowed marks class; must be |<=max_quarterword|} @d number_marks=65536 {|biggest_mark+1|} {using |2^20+2^16| characters instead of |2^21| saves 286MB on the virtual memory size of the running executable} @d biggest_char=1114111 {2097151} {the largest allowed character number; must be |<=max_halfword|} @d too_big_char=1114112 {2097152} {|biggest_char+1|} @d special_char=1114113 {2097153} {|biggest_char+2|} @d number_chars=1114112 {2097152} {|biggest_char+1|} @d number_active_chars=65536+65536 {to reduce the memory req. only two planes } @d string_offset=2097152 @d font_biggest=5535 {the real biggest font} @d number_fonts=font_biggest-font_base+1 @d number_math_fonts=768 @d math_font_biggest=767 @d biggest_lang=32767 @d too_big_lang=32768 @d text_size=0 {size code for the largest size in a family} @d script_size=256 {size code for the medium size in a family} @d script_script_size=512 {size code for the smallest size in a family} @ In case somebody has inadvertently made bad settings of the ``constants,'' \TeX\ checks them using a global variable called |bad|. This is the first of many sections of \TeX\ where global variables are defined. @= @!bad:integer; {is some ``constant'' wrong?} @!luainit:boolean; {are we using lua for initializations } @!tracefilenames:boolean; { print file open-close info? } @ Later on we will say `\ignorespaces|if X>=max_halfword then bad:=14|', or something similar. (We can't do that until |max_halfword| has been defined.) @= bad:=0; if not luainit then tracefilenames:=true; if (half_error_line<30)or(half_error_line>error_line-15) then bad:=1; if max_print_line<60 then bad:=2; if dvi_buf_size mod 8<>0 then bad:=3; if hash_prime>hash_size then bad:=5; if max_in_open>=128 then bad:=6; { |if null_list<256 then bad:=7;| } {we will want |null_list>255|} @ Labels are given symbolic names by the following definitions, so that occasional |goto| statements will be meaningful. We insert the label `|exit|' just before the `\ignorespaces|end|\unskip' of a procedure in which we have used the `|return|' statement defined below; the label `|restart|' is occasionally used at the very beginning of a procedure; and the label `|reswitch|' is occasionally used just prior to a |case| statement in which some cases change the conditions and we wish to branch to the newly applicable case. Loops that are set up with the |loop| construction defined below are commonly exited by going to `|done|' or to `|found|' or to `|not_found|', and they are sometimes repeated by going to `|continue|'. If two or more parts of a subroutine start differently but end up the same, the shared code may be gathered together at `|common_ending|'. Incidentally, this program never declares a label that isn't actually used, because some fussy \PASCAL\ compilers will complain about redundant labels. @d exit=10 {go here to leave a procedure} @d restart=20 {go here to start a procedure again} @d reswitch=21 {go here to start a case statement again} @d continue=22 {go here to resume a loop} @d done=30 {go here to exit a loop} @d done1=31 {like |done|, when there is more than one loop} @d done2=32 {for exiting the second loop in a long block} @d done3=33 {for exiting the third loop in a very long block} @d done4=34 {for exiting the fourth loop in an extremely long block} @d done5=35 {for exiting the fifth loop in an immense block} @d done6=36 {for exiting the sixth loop in a block} @d found=40 {go here when you've found it} @d found1=41 {like |found|, when there's more than one per routine} @d found2=42 {like |found|, when there's more than two per routine} @d not_found=45 {go here when you've found nothing} @d not_found1=46 {like |not_found|, when there's more than one} @d not_found2=47 {like |not_found|, when there's more than two} @d not_found3=48 {like |not_found|, when there's more than three} @d not_found4=49 {like |not_found|, when there's more than four} @d common_ending=50 {go here when you want to merge with another branch} @ Here are some macros for common programming idioms. @d incr(#) == #:=#+1 {increase a variable by unity} @d decr(#) == #:=#-1 {decrease a variable by unity} @d negate(#) == #:=-# {change the sign of a variable} @d loop == @+ while true do@+ {repeat over and over until a |goto| happens} @f loop == xclause {\.{WEB}'s |xclause| acts like `\ignorespaces|while true do|\unskip'} @d do_nothing == pdfassert(1) {empty statement} @d return == goto exit {terminate a procedure call} @f return == nil @d empty=0 {symbolic name for a null constant} @* \[2] The character set. In order to make \TeX\ readily portable to a wide variety of computers, all of its input text is converted to an internal twenty-one-bit code that covers all of unicode, including ASCII, the ``American Standard Code for Information Interchange.'' This conversion is done immediately when each character is read in. Conversely, characters are converted from ASCII to the user's external representation just before they are output to a text file. Such an internal code is relevant to users of \TeX\ primarily because it governs the positions of characters in the fonts. For example, the character `\.A' has ASCII code $65=@'101$, and when \TeX\ typesets this letter it specifies character number 65 in the current font. If that font actually has `\.A' in a different position, \TeX\ doesn't know what the real position is; the program that does the actual printing from \TeX's device-independent files is responsible for converting from ASCII to a particular font encoding. @^ASCII code@> \TeX's internal code also defines the value of constants that begin with a reverse apostrophe; and it provides an index to the \.{\\catcode}, \.{\\mathcode}, \.{\\uccode}, \.{\\lccode}, and \.{\\delcode} tables. @ Characters of text that have been converted to \TeX's internal form are said to be of type |ASCII_code|, which is a subrange of the integers. @= @!ASCII_code=0..biggest_char; {eight-bit numbers} @!BMP_code=0..65535; {sixteen-bit numbers} @ The original \PASCAL\ compiler was designed in the late 60s, when six-bit character sets were common, so it did not make provision for lowercase letters. Nowadays, of course, we need to deal with both capital and small letters in a convenient way, especially in a program for typesetting; so the present specification of \TeX\ has been written under the assumption that the \PASCAL\ compiler and run-time system permit the use of text files with more than 64 distinguishable characters. More precisely, we assume that the character set contains at least the letters and symbols associated with ASCII codes @'40 through @'176; all of these characters are now available on most computer terminals. Since we are dealing with more characters than were present in the first \PASCAL\ compilers, we have to decide what to call the associated data type. Some \PASCAL s use the original name |char| for the characters in text files, even though there now are more than 64 such characters, while other \PASCAL s consider |char| to be a 64-element subrange of a larger data type that has some other name. In order to accommodate this difference, we shall use the name |text_char| to stand for the data type of the characters that are converted to and from |ASCII_code| when they are input and output. We shall also assume that |text_char| consists of the elements |chr(first_text_char)| through |chr(last_text_char)|, inclusive. The following definitions should be adjusted if necessary. @^system dependencies@> @d text_char == char {the data type of characters in text files} @d first_text_char=0 {ordinal number of the smallest element of |text_char|} @d last_text_char=255 {ordinal number of the largest element of |text_char|} @= @!i:integer; @ The \TeX\ processor converts between ASCII code and the user's external character set by means of arrays |xord| and |xchr| that are analogous to \PASCAL's |ord| and |chr| functions. @= @!xchr: array [ASCII_code] of text_char; {specifies conversion of output characters} @ Since we are assuming that our \PASCAL\ system is able to read and write the visible characters of standard ASCII (although not necessarily using the ASCII codes to represent them), the following assignment statements initialize the standard part of the |xchr| array properly, without needing any system-dependent changes. On the other hand, it is possible to implement \TeX\ with less complete character sets, and in such cases it will be necessary to change something here. @^system dependencies@> @= xchr[@'40]:=' '; xchr[@'41]:='!'; xchr[@'42]:='"'; xchr[@'43]:='#'; xchr[@'44]:='$'; xchr[@'45]:='%'; xchr[@'46]:='&'; xchr[@'47]:='''';@/ xchr[@'50]:='('; xchr[@'51]:=')'; xchr[@'52]:='*'; xchr[@'53]:='+'; xchr[@'54]:=','; xchr[@'55]:='-'; xchr[@'56]:='.'; xchr[@'57]:='/';@/ xchr[@'60]:='0'; xchr[@'61]:='1'; xchr[@'62]:='2'; xchr[@'63]:='3'; xchr[@'64]:='4'; xchr[@'65]:='5'; xchr[@'66]:='6'; xchr[@'67]:='7';@/ xchr[@'70]:='8'; xchr[@'71]:='9'; xchr[@'72]:=':'; xchr[@'73]:=';'; xchr[@'74]:='<'; xchr[@'75]:='='; xchr[@'76]:='>'; xchr[@'77]:='?';@/ xchr[@'100]:='@@'; xchr[@'101]:='A'; xchr[@'102]:='B'; xchr[@'103]:='C'; xchr[@'104]:='D'; xchr[@'105]:='E'; xchr[@'106]:='F'; xchr[@'107]:='G';@/ xchr[@'110]:='H'; xchr[@'111]:='I'; xchr[@'112]:='J'; xchr[@'113]:='K'; xchr[@'114]:='L'; xchr[@'115]:='M'; xchr[@'116]:='N'; xchr[@'117]:='O';@/ xchr[@'120]:='P'; xchr[@'121]:='Q'; xchr[@'122]:='R'; xchr[@'123]:='S'; xchr[@'124]:='T'; xchr[@'125]:='U'; xchr[@'126]:='V'; xchr[@'127]:='W';@/ xchr[@'130]:='X'; xchr[@'131]:='Y'; xchr[@'132]:='Z'; xchr[@'133]:='['; xchr[@'134]:='\'; xchr[@'135]:=']'; xchr[@'136]:='^'; xchr[@'137]:='_';@/ xchr[@'140]:='`'; xchr[@'141]:='a'; xchr[@'142]:='b'; xchr[@'143]:='c'; xchr[@'144]:='d'; xchr[@'145]:='e'; xchr[@'146]:='f'; xchr[@'147]:='g';@/ xchr[@'150]:='h'; xchr[@'151]:='i'; xchr[@'152]:='j'; xchr[@'153]:='k'; xchr[@'154]:='l'; xchr[@'155]:='m'; xchr[@'156]:='n'; xchr[@'157]:='o';@/ xchr[@'160]:='p'; xchr[@'161]:='q'; xchr[@'162]:='r'; xchr[@'163]:='s'; xchr[@'164]:='t'; xchr[@'165]:='u'; xchr[@'166]:='v'; xchr[@'167]:='w';@/ xchr[@'170]:='x'; xchr[@'171]:='y'; xchr[@'172]:='z'; xchr[@'173]:='{'; xchr[@'174]:='|'; xchr[@'175]:='}'; xchr[@'176]:='~';@/ @ Some of the ASCII codes without visible characters have been given symbolic names in this program because they are used with a special meaning. @d null_code=@'0 {ASCII code that might disappear} @d carriage_return=@'15 {ASCII code used at end of line} @d invalid_code=@'177 {ASCII code that many systems prohibit in text files} @ The ASCII code is ``standard'' only to a certain extent, since many computer installations have found it advantageous to have ready access to more than 94 printing characters. Appendix~C of {\sl The \TeX book\/} gives a complete specification of the intended correspondence between characters and \TeX's internal representation. @:TeXbook}{\sl The \TeX book@> If \TeX\ is being used on a garden-variety \PASCAL\ for which only standard ASCII codes will appear in the input and output files, it doesn't really matter what codes are specified in |xchr[0..@'37]|, but the safest policy is to blank everything out by using the code shown below. However, other settings of |xchr| will make \TeX\ more friendly on computers that have an extended character set, so that users can type things like `\.^^Z' instead of `\.{\\ne}'. People with extended character sets can assign codes arbitrarily, giving an |xchr| equivalent to whatever characters the users of \TeX\ are allowed to have in their input files. It is best to make the codes correspond to the intended interpretations as shown in Appendix~C whenever possible; but this is not necessary. For example, in countries with an alphabet of more than 26 letters, it is usually best to map the additional letters into codes less than~@'40. To get the most ``permissive'' character set, change |' '| on the right of these assignment statements to |chr(i)|. @^character set dependencies@> @^system dependencies@> @= for i:=0 to biggest_char do xchr[i]:=i; @ The following system-independent code makes the |xord| array contain a suitable inverse to the information in |xchr|. Note that if |xchr[i]=xchr[j]| where |i= @* \[3] Input and output. The bane of portability is the fact that different operating systems treat input and output quite differently, perhaps because computer scientists have not given sufficient attention to this problem. People have felt somehow that input and output are not part of ``real'' programming. Well, it is true that some kinds of programming are more fun than others. With existing input/output conventions being so diverse and so messy, the only sources of joy in such parts of the code are the rare occasions when one can find a way to make the program a little less bad than it might have been. We have two choices, either to attack I/O now and get it over with, or to postpone I/O until near the end. Neither prospect is very attractive, so let's get it over with. The basic operations we need to do are (1)~inputting and outputting of text, to or from a file or the user's terminal; (2)~inputting and outputting of eight-bit bytes, to or from a file; (3)~instructing the operating system to initiate (``open'') or to terminate (``close'') input or output from a specified file; (4)~testing whether the end of an input file has been reached. \TeX\ needs to deal with two kinds of files. We shall use the term |alpha_file| for a file that contains textual data, and the term |byte_file| for a file that contains eight-bit binary information. These two types turn out to be the same on many computers, but sometimes there is a significant distinction, so we shall be careful to distinguish between them. Standard protocols for transferring such files from computer to computer, via high-speed networks, are now becoming available to more and more communities of users. The program actually makes use also of a third kind of file, called a |word_file|, when dumping and reloading base information for its own initialization. We shall define a word file later; but it will be possible for us to specify simple operations on word files before they are defined. @= @!eight_bits=0..65535; {unsigned one-byte quantity} @!real_eight_bits=0..255; {unsigned one-byte quantity} @!alpha_file=packed file of text_char; {files that contain textual data} @!byte_file=packed file of real_eight_bits; {files that contain binary data} @ Most of what we need to do with respect to input and output can be handled by the I/O facilities that are standard in \PASCAL, i.e., the routines called |get|, |put|, |eof|, and so on. But standard \PASCAL\ does not allow file variables to be associated with file names that are determined at run time, so it cannot be used to implement \TeX; some sort of extension to \PASCAL's ordinary |reset| and |rewrite| is crucial for our purposes. We shall assume that |nameoffile| is a variable of an appropriate type such that the \PASCAL\ run-time system being used to implement \TeX\ can open a file whose external name is specified by |nameoffile|. @^system dependencies@> @= @!nameoffile:packed array[1..file_name_size] of char;@;@/ {on some systems this may be a \&{record} variable} @!namelength:0..file_name_size;@/{this many characters are actually relevant in |nameoffile| (the rest are blank)} @!name_file_pointer:alpha_file; @ The \ph\ compiler with which the present version of \TeX\ was prepared has extended the rules of \PASCAL\ in a very convenient way. To open file~|f|, we can write $$\vbox{\halign{#\hfil\qquad&#\hfil\cr |reset(f,@t\\{name}@>,'/O')|&for input;\cr |rewrite(f,@t\\{name}@>,'/O')|&for output.\cr}}$$ The `\\{name}' parameter, which is of type `{\bf packed array $[\langle\\{any}\rangle]$ of \\{char}}', stands for the name of the external file that is being opened for input or output. Blank spaces that might appear in \\{name} are ignored. The `\.{/O}' parameter tells the operating system not to issue its own error messages if something goes wrong. If a file of the specified name cannot be found, or if such a file cannot be opened for some other reason (e.g., someone may already be trying to write the same file), we will have |@!erstat(f)<>0| after an unsuccessful |reset| or |rewrite|. This allows \TeX\ to undertake appropriate corrective action. @:PASCAL H}{\ph@> @^system dependencies@> \TeX's file-opening procedures return |false| if no file identified by |nameoffile| could be opened. @d reset_OK(#)==erstat(#)=0 @d rewrite_OK(#)==erstat(#)=0 @p function a_open_in(var f:alpha_file):boolean; {open a text file for input} begin reset(f,nameoffile,'/O'); a_open_in:=reset_OK(f); end; @# function a_open_out(var f:alpha_file):boolean; {open a text file for output} begin rewrite(f,nameoffile,'/O'); a_open_out:=rewrite_OK(f); end; @# function b_open_in(var f:byte_file):boolean; {open a binary file for input} begin reset(f,nameoffile,'/O'); b_open_in:=reset_OK(f); end; @# function b_open_out(var f:byte_file):boolean; {open a binary file for output} begin rewrite(f,nameoffile,'/O'); b_open_out:=rewrite_OK(f); end; @# function w_open_in(var f:word_file):boolean; {open a word file for input} begin reset(f,nameoffile,'/O'); w_open_in:=reset_OK(f); end; @# function w_open_out(var f:word_file):boolean; {open a word file for output} begin rewrite(f,nameoffile,'/O'); w_open_out:=rewrite_OK(f); end; @ When input files are opened via a callback, they will also be read using callbacks. for that purpose, the |open_read_file_callback| returns an integer to uniquely identify a callback table. This id replaces the file point |f| in this case, because the input does not have to be a file in the traditional sense. Signalling this fact is achieved by having two arrays of integers. @= @!input_file_callback_id : ^integer; @!read_file_callback_id : array[0..16] of integer; @ @p function lua_a_open_in(var f:alpha_file; n:quarterword):boolean; var k:integer; { a temporary value } fnam:str_number; { string returned by find callback } callback_id:integer; file_ok:boolean; { the status so far } begin file_ok:=true; if n=0 then begin texinputtype := 1; {Tell |open_input| we are \.{\\input}.} input_file_callback_id[index] := 0; end else begin texinputtype:=0; read_file_callback_id[n] := 0; end; callback_id:=callback_defined(find_read_file_callback); if callback_id>0 then begin fnam:=0; file_ok:=run_callback(callback_id,'dS->s',n,stringcast(nameoffile+1),addressof(fnam)); if (file_ok)and(fnam<>0)and(length(fnam)>0) then begin @; end else file_ok:=false; {file not found} end; if file_ok then begin callback_id:=callback_defined(open_read_file_callback); if callback_id>0 then begin k := run_and_save_callback(callback_id,'S->',stringcast(nameoffile+1)); if k>0 then begin lua_a_open_in := true; if n=0 then input_file_callback_id[index] := k else read_file_callback_id[n] := k; end else file_ok:=false; {read failed} end else begin {no read callback} if openinnameok(stringcast(nameoffile+1)) then begin lua_a_open_in := a_open_in(f,kpsetexformat); name_file_pointer := f; end else file_ok:=false; {open failed} end; end; if not file_ok then begin lua_a_open_in := false; name_file_pointer := 0; end; end; @# function lua_a_open_out(var f:alpha_file; n:quarterword):boolean; var test:boolean; k:integer; fnam:str_number; callback_id:integer; begin name_file_pointer := 0; callback_id:=callback_defined(find_write_file_callback); if callback_id>0 then begin fnam:=0; test:=run_callback(callback_id,'dS->s',n,stringcast(nameoffile+1),addressof(fnam)); if (test)and(fnam<>0)and(length(fnam)>0) then begin @; lua_a_open_out := do_a_open_out(f); name_file_pointer := f; end else lua_a_open_out := false; end else begin if openoutnameok(stringcast(nameoffile+1)) then begin lua_a_open_out := a_open_out(f); name_file_pointer := f; end else lua_a_open_out := false; end; end; function lua_b_open_out(var f:alpha_file):boolean; var test:boolean; k:integer; fnam:str_number; callback_id:integer; begin name_file_pointer := 0; callback_id:=callback_defined(find_output_file_callback); if callback_id>0 then begin fnam:=0; test:=run_callback(callback_id,'S->s',stringcast(nameoffile+1),addressof(fnam)); if (test)and(fnam<>0)and(length(fnam)>0) then begin @; lua_b_open_out := do_b_open_out(f); name_file_pointer := f; end else lua_b_open_out := false; end else begin if openoutnameok(stringcast(nameoffile+1)) then begin lua_b_open_out := b_open_out(f); name_file_pointer := f; end else lua_b_open_out := false; end; end; @ @= libcfree (nameoffile); nameoffile := xmallocarray (packed_ASCII_code, length(fnam)+2); for k:=str_start_macro(fnam) to str_start_macro(fnam+1)-1 do nameoffile[k-str_start_macro(fnam)+1] := str_pool[k]; nameoffile[length(fnam)+1]:=0; namelength := length(fnam); flush_string @ Files can be closed with the \ph\ routine `|close(f)|', which @^system dependencies@> should be used when all input or output with respect to |f| has been completed. This makes |f| available to be opened again, if desired; and if |f| was used for output, the |close| operation makes the corresponding external file appear on the user's area, ready to be read. These procedures should not generate error messages if a file is being closed before it has been successfully opened. @p procedure a_close(var f:alpha_file); {close a text file} begin close(f); end; @# procedure b_close(var f:byte_file); {close a binary file} begin close(f); end; @# procedure w_close(var f:word_file); {close a word file} begin close(f); end; @ @p procedure lua_a_close_in(var f:alpha_file; n:quarterword); {close a text file} var ret:boolean; callback_id:integer; begin if n=0 then callback_id:=input_file_callback_id[index] else callback_id:=read_file_callback_id[n]; if callback_id>0 then begin ret:=run_saved_callback(callback_id,'close','->'); destroy_saved_callback(callback_id); if n=0 then input_file_callback_id[index] := 0 else read_file_callback_id[n] := 0; end else a_close(f); end; @# procedure lua_a_close_out(var f:alpha_file); {close a text file} begin a_close(f); end; @ Binary input and output are done with \PASCAL's ordinary |get| and |put| procedures, so we don't have to make any other special arrangements for binary~I/O. Text output is also easy to do with standard \PASCAL\ routines. The treatment of text input is more difficult, however, because of the necessary translation to |ASCII_code| values. \TeX's conventions should be efficient, and they should blend nicely with the user's operating environment. @ Input from text files is read one line at a time, using a routine called |lua_input_ln|. This function is defined in terms of global variables called |buffer|, |first|, and |last| that will be described in detail later; for now, it suffices for us to know that |buffer| is an array of |ASCII_code| values, and that |first| and |last| are indices into this array representing the beginning and ending of a line of text. @= @!buffer:array[0..buf_size] of packed_ASCII_code; {lines of characters being read} @!first:0..buf_size; {the first unused position in |buffer|} @!last:0..buf_size; {end of the line just input to |buffer|} @!max_buf_stack:0..buf_size; {largest index used in |buffer|} @ The |lua_input_ln| function brings the next line of input from the specified file into available positions of the buffer array and returns the value |true|, unless the file has already been entirely read, in which case it returns |false| and sets |last:=first|. In general, the |ASCII_code| numbers that represent the next line of the file are input into |buffer[first]|, |buffer[first+1]|, \dots, |buffer[last-1]|; and the global variable |last| is set equal to |first| plus the length of the line. Trailing blanks are removed from the line; thus, either |last=first| (in which case the line was entirely blank) or |buffer[last-1]<>" "|. An overflow error is given, however, if the normal actions of |lua_input_ln| would make |last>=buf_size|; this is done so that other parts of \TeX\ can safely look at the contents of |buffer[last+1]| without overstepping the bounds of the |buffer| array. Upon entry to |lua_input_ln|, the condition |first @p function input_ln(var f:alpha_file;@!bypass_eoln:boolean):boolean; {inputs the next line or returns |false|} var last_nonblank:0..buf_size; {|last| with trailing blanks removed} begin if bypass_eoln then if not eof(f) then get(f); {input the first character of the line into |f^|} last:=first; {cf.\ Matthew 19\thinspace:\thinspace30} if eof(f) then input_ln:=false else begin last_nonblank:=first; while not eoln(f) do begin if last>=max_buf_stack then begin max_buf_stack:=last+1; if max_buf_stack=buf_size then @; end; buffer[last]:=f^; get(f); incr(last); if buffer[last-1]<>" " then last_nonblank:=last; end; last:=last_nonblank; input_ln:=true; end; end; function lua_input_ln(var f:alpha_file;n:quarterword;@!bypass_eoln:boolean):boolean; var lua_result:boolean; last_ptr:integer; callback_id:integer; bypass_p:boolean; begin if n=0 then callback_id:=input_file_callback_id[index] else callback_id:=read_file_callback_id[n]; if callback_id>0 then begin last:=first; last_ptr := first; lua_result := run_saved_callback(callback_id,'reader','->l', addressof(last_ptr)); if (lua_result=true)and(last_ptr<>0) then begin last := last_ptr; if last>max_buf_stack then max_buf_stack:=last; lua_input_ln := true; @; end else lua_input_ln := false; end else begin bypass_p := bypass_eoln; {this is for -Wextra} lua_result := input_ln(f,bypass_p); if lua_result=true then begin lua_input_ln := true; @; end else lua_input_ln := false; end end; @ @= if last>=first then begin callback_id := callback_defined(process_input_buffer_callback); if callback_id>0 then begin last_ptr := first; lua_result := run_callback(callback_id, 'l->l', (last-first), addressof(last_ptr)); if (lua_result=true)and(last_ptr<>0) then begin last := last_ptr; if last>max_buf_stack then max_buf_stack:=last; end; end; end @ The user's terminal acts essentially like other files of text, except that it is used both for input and for output. When the terminal is considered an input file, the file variable is called |term_in|, and when it is considered an output file the file variable is |term_out|. @^system dependencies@> @= @!term_in:alpha_file; {the terminal as an input file} @!term_out:alpha_file; {the terminal as an output file} @ Here is how to open the terminal files in \ph. The `\.{/I}' switch suppresses the first |get|. @^system dependencies@> @d t_open_in==reset(term_in,'TTY:','/O/I') {open the terminal for text input} @d t_open_out==rewrite(term_out,'TTY:','/O') {open the terminal for text output} @ Sometimes it is necessary to synchronize the input/output mixture that happens on the user's terminal, and three system-dependent procedures are used for this purpose. The first of these, |update_terminal|, is called when we want to make sure that everything we have output to the terminal so far has actually left the computer's internal buffers and been sent. The second, |clear_terminal|, is called when we wish to cancel any input that the user may have typed ahead (since we are about to issue an unexpected error message). The third, |wake_up_terminal|, is supposed to revive the terminal if the user has disabled it by some instruction to the operating system. The following macros show how these operations can be specified in \ph: @^system dependencies@> @d update_terminal == break(term_out) {empty the terminal output buffer} @d clear_terminal == break_in(term_in,true) {clear the terminal input buffer} @d wake_up_terminal == do_nothing {cancel the user's cancellation of output} @ We need a special routine to read the first line of \TeX\ input from the user's terminal. This line is different because it is read before we have opened the transcript file; there is sort of a ``chicken and egg'' problem here. If the user types `\.{\\input paper}' on the first line, or if some macro invoked by that line does such an \.{\\input}, the transcript file will be named `\.{paper.log}'; but if no \.{\\input} commands are performed during the first line of terminal input, the transcript file will acquire its default name `\.{texput.log}'. (The transcript file will not contain error messages generated by the first line before the first \.{\\input} command.) @.texput@> The first line is even more special if we are lucky enough to have an operating system that treats \TeX\ differently from a run-of-the-mill \PASCAL\ object program. It's nice to let the user start running a \TeX\ job by typing a command line like `\.{tex paper}'; in such a case, \TeX\ will operate as if the first line of input were `\.{paper}', i.e., the first line will consist of the remainder of the command line, after the part that invoked \TeX. The first line is special also because it may be read before \TeX\ has input a format file. In such cases, normal error messages cannot yet be given. The following code uses concepts that will be explained later. (If the \PASCAL\ compiler does not support non-local |@!goto|\unskip, the @^system dependencies@> statement `|goto final_end|' should be replaced by something that quietly terminates the program.) @= if format_ident=0 then begin writeln(term_out,'Buffer size exceeded!'); goto final_end; @.Buffer size exceeded@> end else check_buffer_overflow(buf_size+10) { need a little bit more} @ Different systems have different ways to get started. But regardless of what conventions are adopted, the routine that initializes the terminal should satisfy the following specifications: \yskip\textindent{1)}It should open file |term_in| for input from the terminal. (The file |term_out| will already be open for output to the terminal.) \textindent{2)}If the user has given a command line, this line should be considered the first line of terminal input. Otherwise the user should be prompted with `\.{**}', and the first line of input should be whatever is typed in response. \textindent{3)}The first line of input, which might or might not be a command line, should appear in locations |first| to |last-1| of the |buffer| array. \textindent{4)}The global variable |loc| should be set so that the character to be read next by \TeX\ is in |buffer[loc]|. This character should not be blank, and we should have |loc @p function init_terminal:boolean; {gets the terminal input started} label exit; begin t_open_in; loop@+begin wake_up_terminal; write(term_out,'**'); update_terminal; @.**@> if not input_ln(term_in,true) then {this shouldn't happen} begin writeln(term_out); write(term_out,'! End of file on the terminal... why?'); @.End of file on the terminal@> init_terminal:=false; return; end; loc:=first; while (loc which converts single-character strings into the ASCII code number of the single character involved, while it converts other strings into integers and builds a string pool file. Thus, when the string constant \.{"."} appears in the program below, \.{WEB} converts it into the integer 46, which is the ASCII code for a period, while \.{WEB} will convert a string like \.{"hello"} into some integer greater than~|biggest_char|. Some \PASCAL\ compilers won't pack integers into a single byte unless the integers lie in the range |-128..127|. To accommodate such systems we access the string pool only via macros that can easily be redefined. @d si(#) == # {convert from |ASCII_code| to |packed_ASCII_code|} @d so(#) == # {convert from |packed_ASCII_code| to |ASCII_code|} @d str_start_macro(#) == str_start[(#) - string_offset] @= @!pool_pointer = integer; {for variables that point into |str_pool|} @!str_number = integer; {for variables that point into |str_start|} @!packed_ASCII_code = 0..255; {elements of |str_pool| array} @ @= @!str_pool:packed array[pool_pointer] of packed_ASCII_code; {the characters} @!str_start : array[str_number] of pool_pointer; {the starting pointers} @!pool_ptr : pool_pointer; {first unused position in |str_pool|} @!str_ptr : str_number; {number of the current string being created} @!init_pool_ptr : pool_pointer; {the starting value of |pool_ptr|} @!init_str_ptr : str_number; {the starting value of |str_ptr|} @ Several of the elementary string operations are performed using \.{WEB} macros instead of \PASCAL\ procedures, because many of the operations are done quite frequently and we want to avoid the overhead of procedure calls. For example, here is a simple macro that computes the length of a string. @.WEB@> @d length(#)==(str_start_macro(#+1)-str_start_macro(#)) {the number of characters in string number \#} @ The length of the current string is called |cur_length|: @d cur_length == (pool_ptr - str_start_macro(str_ptr)) @ Strings are created by appending character codes to |str_pool|. The |append_char| macro, defined here, does not check to see if the value of |pool_ptr| has gotten too high; this test is supposed to be made before |append_char| is used. There is also a |flush_char| macro, which erases the last character appended. To test if there is room to append |l| more characters to |str_pool|, we shall write |str_room(l)|, which aborts \TeX\ and gives an apologetic error message if there isn't enough room. @d append_char(#) == {put |ASCII_code| \# at the end of |str_pool|} begin str_pool[pool_ptr]:=si(#); incr(pool_ptr); end @d flush_char == decr(pool_ptr) {forget the last character in the pool} @d str_room(#) == check_pool_overflow((pool_ptr+#)) {test or grow the pool} @p procedure string_room(t:integer); begin str_room(t); end; @# procedure append_pool_char(c:ASCII_code); begin append_char(c); end; @ Once a sequence of characters has been appended to |str_pool|, it officially becomes a string when the function |make_string| is called. This function returns the identification number of the new string as its value. @p function make_string : str_number; {current string enters the pool} begin if str_ptr=(max_strings+string_offset) then overflow("number of strings",max_strings-init_str_ptr); @:TeX capacity exceeded number of strings}{\quad number of strings@> incr(str_ptr); str_start_macro(str_ptr):=pool_ptr; make_string:=str_ptr-1; end; @ To destroy the most recently made string, we say |flush_string|. The helper routines reads one utf sequence from the pool resp. the buffer, and returns its character value. @d flush_string==begin decr(str_ptr); pool_ptr:=str_start_macro(str_ptr); end @d test_sequence_byte(#)==if (#<@"80) or (#>=@"C0) then begin print_err("Text line contains an invalid utf-8 sequence"); @.Text line contains...@> help2("A funny symbol that I can't read has just been input.")@/ ("Just continue, I'll change it to 0xFFFD.");@/ deletions_allowed:=false; error; deletions_allowed:=true; {the assigment before return is needed because web2c generates incorrect code} a:=@"FFFD; buffer_to_unichar:=a; return; end @p function buffer_to_unichar(@!k:integer):integer; label exit; var a:@!integer; {a utf char} b:@!integer; { a utf nibble} begin b := buffer[k]; if b<@"80 then a := b else if b>=@"F0 then begin a := (b-@"F0) * 64; b := buffer[k+1]; test_sequence_byte(b); a := (a + (b-128)) * 64; b := buffer[k+2]; test_sequence_byte(b); a := (a + (b-128)) * 64; b := buffer[k+3]; test_sequence_byte(b); a := a + (b-128); end else if b>=@"E0 then begin a := (b-@"E0) * 64; b := buffer[k+1]; test_sequence_byte(b); a := (a + (b-128)) * 64; b := buffer[k+2]; test_sequence_byte(b); a := a + (b-128); end else if b>=@"C0 then begin a := (b-@"C0) * 64; b := buffer[k+1]; test_sequence_byte(b); a := a + (b-128); end else begin { NI: this is an encoding error } print_err("Buffer contains an invalid utf-8 sequence"); @.Text line contains...@> help2("A funny symbol somehow ended up in the input buffer.")@/ ("Just continue, I'll change it to 0xFFFD.");@/ deletions_allowed:=false; error; deletions_allowed:=true; a:=@"FFFD; end; exit: buffer_to_unichar := a; end ; @ The following subroutine compares string |s| with another string of the same length that appears in |buffer| starting at position |k|; the result is |true| if and only if the strings are equal. Empirical tests indicate that |str_eq_buf| is used in such a way that it tends to return |true| about 80 percent of the time. @p function str_eq_buf(@!s:str_number;@!k:integer):boolean; {test equality of strings} label not_found; {loop exit} var j: pool_pointer; {running index} a: ASCII_code; {a unicode character } @!result: boolean; {result of comparison} begin result:=false; if ss then goto not_found; end else begin j:=str_start_macro(s); while jbuffer[k] then begin goto not_found; end; incr(j); incr(k); end; end; result:=true; not_found: str_eq_buf:=result; end; @ Here is a similar routine, but it compares two strings in the string pool, and it does not assume that they have the same length. @p function str_eq_str(@!s,@!t:str_number):boolean; {test equality of strings} label found,not_found; {loop exit} var j,@!k: pool_pointer; {running indices} a: ASCII_code; {a utf char} @!result: boolean; {result of comparison} begin result:=false; a:= 0; if s=string_offset then begin if s<=@"7F then if length(t)=1 then if str_pool[str_start_macro(t)]=s then goto found; a := pool_to_unichar(str_start_macro(t)); if a<>s then goto not_found; end else if t<>s then goto not_found; end else if tt then goto not_found; end else begin if length(s)<>length(t) then goto not_found; j:=str_start_macro(s); k:=str_start_macro(t); while jstr_pool[k] then goto not_found; incr(j); incr(k); end; end; found: result:=true; not_found: str_eq_str:=result; end; @ The initial values of |str_pool|, |str_start|, |pool_ptr|, and |str_ptr| are computed by the \.{INITEX} program, based in part on the information that \.{WEB} has output while processing \TeX. @.INITEX@> @^string pool@> The first |string_offset| strings are single-characters strings matching Unicode. There is no point in generating all of these. But |str_ptr| has initialized properly, otherwise |print_char| cannot see the difference between characters and strings. @p @!init function get_strings_started:boolean; {initializes the string pool, but returns |false| if something goes wrong} label done,exit; var @!g:str_number; {garbage} begin pool_ptr:=0; str_ptr:=string_offset; str_start[0]:=0; @; exit:end; tini @ @d app_lc_hex(#)==l:=#; if l<10 then append_char(l+"0")@+else append_char(l-10+"a") @ The first 128 strings will contain 95 standard ASCII characters, and the other 33 characters will be printed in three-symbol form like `\.{\^\^A}' unless a system-dependent change is made here. Installations that have an extended character set, where for example |xchr[@'32]=@t\.{\'^^Z\'}@>|, would like string @'32 to be the single character @'32 instead of the three characters @'136, @'136, @'132 (\.{\^\^Z}). On the other hand, even people with an extended character set will want to represent string @'15 by \.{\^\^M}, since @'15 is |carriage_return|; the idea is to produce visible strings instead of tabs or line-feeds or carriage-returns or bell-rings or characters that are treated anomalously in text files. Unprintable characters of codes 128--255 are, similarly, rendered \.{\^\^80}--\.{\^\^ff}. Unprintable characters of codes 256--|65535| are, similarly, rendered \.{\^\^\^\^0100}--\.{\^\^\^\^ffff}. The boolean expression defined here should be |true| unless \TeX\ internal code number~|k| corresponds to a non-troublesome visible symbol in the local character set. An appropriate formula for the extended character set recommended in {\sl The \TeX book\/} would, for example, be `|k in [0,@'10..@'12,@'14,@'15,@'33,@'177..@'377]|'. If character |k| cannot be printed, and |k<@'200|, then character |k+@'100| or |k-@'100| must be printable; moreover, ASCII codes |[@'41..@'46, @'60..@'71, @'141..@'146, @'160..@'171]| must be printable. Thus, at least 80 printable characters are needed. @:TeXbook}{\sl The \TeX book@> @^character set dependencies@> @^system dependencies@> @ @= g := loadpoolstrings((pool_size-string_vacancies)); if g=0 then begin wake_up_terminal; writeln(term_out,'! You have to increase POOLSIZE.'); get_strings_started:=false; return; end; get_strings_started:=true; @* \[5] On-line and off-line printing. Messages that are sent to a user's terminal and to the transcript-log file are produced by several `|print|' procedures. These procedures will direct their output to a variety of places, based on the setting of the global variable |selector|, which has the following possible values: \yskip \hang |term_and_log|, the normal setting, prints on the terminal and on the transcript file. \hang |log_only|, prints only on the transcript file. \hang |term_only|, prints only on the terminal. \hang |no_print|, doesn't print at all. This is used only in rare cases before the transcript file is open. \hang |pseudo|, puts output into a cyclic buffer that is used by the |show_context| routine; when we get to that routine we shall discuss the reasoning behind this curious mode. \hang |new_string|, appends the output to the current string in the string pool. \hang 0 to 15, prints on one of the sixteen files for \.{\\write} output. \yskip \noindent The symbolic names `|term_and_log|', etc., have been assigned numeric codes that satisfy the convenient relations |no_print+1=term_only|, |no_print+2=log_only|, |term_only+2=log_only+1=term_and_log|. Three additional global variables, |tally| and |term_offset| and |file_offset|, record the number of characters that have been printed since they were most recently cleared to zero. We use |tally| to record the length of (possibly very long) stretches of printing; |term_offset| and |file_offset|, on the other hand, keep track of how many characters have appeared so far on the current line that has been output to the terminal or to the transcript file, respectively. @d no_print=16 {|selector| setting that makes data disappear} @d term_only=17 {printing is destined for the terminal only} @d log_only=18 {printing is destined for the transcript file only} @d term_and_log=19 {normal |selector| setting} @d pseudo=20 {special |selector| setting for |show_context|} @d new_string=21 {printing is deflected to the string pool} @d max_selector=21 {highest selector setting} @= @!log_file : alpha_file; {transcript of \TeX\ session} @!term_out_mode:halfword; @!term_out_translation:halfword; @!selector : 0..max_selector; {where to print a message} @!dig : array[0..22] of 0..15; {digits in a number being output} @!tally : integer; {the number of characters recently printed} @!term_offset : 0..max_print_line; {the number of characters on the current terminal line} @!file_offset : 0..max_print_line; {the number of characters on the current file line} @!trick_buf:array[0..error_line] of packed_ASCII_code; {circular buffer for pseudoprinting} @!trick_count: integer; {threshold for pseudoprinting, explained later} @!first_count: integer; {another variable for pseudoprinting} @!inhibit_par_tokens:boolean; { for minor adjustments to |show_token_list| } @ @= selector:=term_only; tally:=0; term_offset:=0; file_offset:=0; inhibit_par_tokens:=false; @ Macro abbreviations for output to the terminal and to the log file are defined here for convenience. Some systems need special conventions for terminal output, and it is possible to adhere to those conventions by changing |wterm|, |wterm_ln|, and |wterm_cr| in this section. @^system dependencies@> @d wterm(#)==write(term_out,#) @d wterm_ln(#)==writeln(term_out,#) @d wterm_cr==writeln(term_out) @d wlog(#)==write(log_file,#) @d wlog_ln(#)==writeln(log_file,#) @d wlog_cr==writeln(log_file) @ To end a line of text output, we call |print_ln|. @= procedure print_ln; {prints an end-of-line} begin case selector of term_and_log: begin wterm_cr; wlog_cr; term_offset:=0; file_offset:=0; end; log_only: begin wlog_cr; file_offset:=0; end; term_only: begin wterm_cr; term_offset:=0; end; no_print,pseudo,new_string: do_nothing; othercases writeln(write_file[selector]) endcases;@/ end; {|tally| is not affected} @ The |print_char| procedure sends one character to the desired destination, using the |xchr| array to map it into an external character compatible with |lua_input_ln|. All printing comes through |print_ln| or |print_char|. @d wterm_char(#)==begin if (#>=@"20)or(#=@"0A)or(#=@"0D)or(#=@"09) then wterm(xchr[#]) else begin if term_offset+2>=max_print_line then begin wterm_cr; term_offset:=0; end; incr(term_offset); wterm('^'); incr(term_offset); wterm('^'); wterm(xchr[#+64]); end; end @d fix_term_offset(#)== if ((#>=@"C0) and (#<=@"DF) and (term_offset+2>=max_print_line)) or ((#>=@"E0) and (#<=@"EF) and (term_offset+3>=max_print_line)) or ((#>=@"F0) and (term_offset+4>=max_print_line)) then begin wterm_cr; term_offset:=0; end @d fix_log_offset(#)== if ((#>=@"C0) and (#<=@"DF) and (file_offset+2>=max_print_line)) or ((#>=@"E0) and (#<=@"EF) and (file_offset+3>=max_print_line)) or ((#>=@"F0) and (file_offset+4>=max_print_line)) then begin wlog_cr; file_offset:=0; end @= procedure print_char(@!s:ASCII_code); {prints a single character} label exit; begin if @ then if selector The first 256 entries above the 17th unicode plane are used for a special trick: when \TeX\ has to print items in that range, it will instead print the character that results from substracting @"110000 from that value. This allows byte-oriented output to things like \.{\\specials} and \.{\\pdfliterals}. Todo: Perhaps it would be useful to do the same substraction while typesetting. @d print_lc_hex(#)==l:=#; if l<10 then print_char(l+"0") else print_char(l-10+"a"); @= function pool_to_unichar(@!t:pool_pointer):ASCII_code; var a,result:@!ASCII_code; {a utf char} b:@!quarterword; { a utf nibble} begin b := str_pool[t]; if b<=@"7F then a := b else if b>=@"F0 then begin a := (b-@"F0) * 64; b := str_pool[t+1]; a := (a + (b-128)) * 64; b := str_pool[t+2]; a := (a + (b-128)) * 64; b := str_pool[t+3]; a := a + (b-128); end else if b>=@"E0 then begin a := (b-@"E0) * 64; b := str_pool[t+1]; a := (a + (b-128)) * 64; b := str_pool[t+2]; a := a + (b-128); end else if b>=@"C0 then begin a := (b-@"C0) * 64; b := str_pool[t+1]; a := a + (b-128); end else begin { NI: this is an encoding error } wlog_ln('! Pool contains an invalid utf-8 sequence'); wterm_ln('! Pool contains an invalid utf-8 sequence'); @.Text line contains...@> help2("A funny symbol somehow ended up in the string pool.")@/ ("Just continue, I'll change it to 0xFFFD.");@/ deletions_allowed:=false; error; deletions_allowed:=true; a:=@"FFFD; end; result := a; pool_to_unichar := result; end ; procedure print(@!s:integer); {prints string |s|} label exit; var j:pool_pointer; {current character code position} @!v:integer; { a unicode char } begin if s>=str_ptr then s:="???" {this can't happen} @.???@> else if spseudo) then begin print_char(s); return; {internal strings are not expanded} end; if (@) then if selector=@"110000 then print_char(s-@"110000) else begin print_char(@"F0 + (s div @"40000)); print_char(@"80 + ((s mod @"40000) div @"1000)); print_char(@"80 + (((s mod @"40000) mod @"1000) div @"40)); print_char(@"80 + (((s mod @"40000) mod @"1000) mod @"40)); end; end; return; end; j:=str_start_macro(s); while j0)and(odd(selector)))or@| ((file_offset>0)and(selector>=log_only)) then print_ln; print(s); end; procedure print_nlp; {move to beginning of a line} begin if ((term_offset>0)and(odd(selector)))or@| ((file_offset>0)and(selector>=log_only)) then print_ln; end; @ Control sequence names, file names, and strings constructed with \.{\\string} might contain |ASCII_code| values that can't be printed using |print_char|. Therefore we use |slow_print| for them: @= procedure slow_print(@!s:integer); {prints string |s|} var j:pool_pointer; {current character code position} begin if (s>=str_ptr) or (s= print_banner; @ The procedure |print_nl| is like |print|, but it makes sure that the string appears at the beginning of a new line. (moved) @= @ The procedure |print_esc| prints a string that is preceded by the user's escape character (which is usually a backslash). @= procedure print_esc(@!s:str_number); {prints escape character, then |s|} var c:integer; {the escape character code} begin @; if c>=0 then if c= procedure print_the_digs(@!k:eight_bits); {prints |dig[k-1]|$\,\ldots\,$|dig[0]|} begin while k>0 do begin decr(k); if dig[k]<10 then print_char("0"+dig[k]) else print_char("A"-10+dig[k]); end; end; @ The following procedure, which prints out the decimal representation of a given integer |n|, has been written carefully so that it works properly if |n=0| or if |(-n)| would cause overflow. It does not apply |mod| or |div| to negative arguments, since such operations are not implemented consistently by all \PASCAL\ compilers. @= procedure print_int(@!n:integer); {prints an integer in decimal form} var k:0..23; {index to current digit; we assume that $|n|<10^{23}$} @!m:integer; {used to negate |n| in possibly dangerous cases} begin k:=0; if n<0 then begin print_char("-"); if n>-100000000 then negate(n) else begin m:=-1-n; n:=m div 10; m:=(m mod 10)+1; k:=1; if m<10 then dig[0]:=m else begin dig[0]:=0; incr(n); end; end; end; repeat dig[k]:=n mod 10; n:=n div 10; incr(k); until n=0; print_the_digs(k); end; @ Here is a trivial procedure to print two digits; it is usually called with a parameter in the range |0<=n<=99|. @p procedure print_two(@!n:integer); {prints two least significant digits} begin n:=abs(n) mod 100; print_char("0"+(n div 10)); print_char("0"+(n mod 10)); end; @ Hexadecimal printing of nonnegative integers is accomplished by |print_hex|. @p procedure print_hex(@!n:integer); {prints a positive integer in hexadecimal form} var k:0..22; {index to current digit; we assume that $0\L n<16^{22}$} begin k:=0; print_char(""""); repeat dig[k]:=n mod 16; n:=n div 16; incr(k); until n=0; print_the_digs(k); end; @ Old versions of \TeX\ needed a procedure called |print_ASCII| whose function is now subsumed by |print|. We retain the old name here as a possible aid to future software arch\ae ologists. @d print_ASCII == print @d print_font_name(#)==begin print(tex_font_name(#)); flush_string; end @ Roman numerals are produced by the |print_roman_int| routine. Readers who like puzzles might enjoy trying to figure out how this tricky code works; therefore no explanation will be given. Notice that 1990 yields \.{mcmxc}, not \.{mxm}. @p procedure print_roman_int(@!n:integer); label exit; var j,@!k: pool_pointer; {mysterious indices into |str_pool|} @!u,@!v: nonnegative_integer; {mysterious numbers} begin j:=str_start_macro("m2d5c2l5x2v5i"); v:=1000; loop@+ begin while n>=v do begin print_char(so(str_pool[j])); n:=n-v; end; if n<=0 then return; {nonpositive input produces no output} k:=j+2; u:=v div (so(str_pool[k-1])-"0"); if str_pool[k-1]=si("2") then begin k:=k+2; u:=u div (so(str_pool[k-1])-"0"); end; if n+u>=v then begin print_char(so(str_pool[k])); n:=n+u; end else begin j:=j+2; v:=v div (so(str_pool[j-1])-"0"); end; end; exit:end; @ The |print| subroutine will not print a string that is still being created. The following procedure will. @p procedure print_current_string; {prints a yet-unmade string} var j:pool_pointer; {points to current character code} begin j:=str_start_macro(str_ptr); while j= procedure prompt_input(s:str_number); begin wake_up_terminal; print(s); term_input; end; @ @p procedure term_input; {gets a line from the terminal} var k:0..buf_size; {index into |buffer|} begin update_terminal; {now the user sees the prompt for sure} if not input_ln(term_in,true) then fatal_error("End of file on the terminal!"); @.End of file on the terminal@> term_offset:=0; {the user's line ended with \<\rm return>} decr(selector); {prepare to echo the input} if last<>first then for k:=first to last-1 do print(buffer[k]); print_ln; incr(selector); {restore previous status} end; @* \[6] Reporting errors. When something anomalous is detected, \TeX\ typically does something like this: $$\vbox{\halign{#\hfil\cr |print_err("Something anomalous has been detected");|\cr |help3("This is the first line of my offer to help.")|\cr |("This is the second line. I'm trying to")|\cr |("explain the best way for you to proceed.");|\cr |error;|\cr}}$$ A two-line help message would be given using |help2|, etc.; these informal helps should use simple vocabulary that complements the words used in the official error message that was printed. (Outside the U.S.A., the help messages should preferably be translated into the local vernacular. Each line of help is at most 60 characters long, in the present implementation, so that |max_print_line| will not be exceeded.) The |print_err| procedure supplies a `\.!' before the official message, and makes sure that the terminal is awake if a stop is going to occur. The |error| procedure supplies a `\..' after the official message, then it shows the location of the error; and if |interaction=error_stop_mode|, it also enters into a dialog with the user, during which time the help message may be printed. @^system dependencies@> @ The global variable |interaction| has four settings, representing increasing amounts of user interaction: @d batch_mode=0 {omits all stops and omits terminal output} @d nonstop_mode=1 {omits all stops} @d scroll_mode=2 {omits error stops} @d error_stop_mode=3 {stops at every opportunity to interact} @d print_err(#)==begin if interaction=error_stop_mode then wake_up_terminal; print_nl("! "); print(#); last_error:=#; end @d print_warn(#)==begin if interaction=error_stop_mode then wake_up_terminal; if prepend_nl then begin print_nl(""); print_ln end; print_nl(#); end @= @!interaction:batch_mode..error_stop_mode; {current level of interaction} @!last_error:str_number; @ @=interaction:=error_stop_mode; @ \TeX\ is careful not to call |error| when the print |selector| setting might be unusual. The only possible values of |selector| at the time of error messages are \yskip\hang|no_print| (when |interaction=batch_mode| and |log_file| not yet open); \hang|term_only| (when |interaction>batch_mode| and |log_file| not yet open); \hang|log_only| (when |interaction=batch_mode| and |log_file| is open); \hang|term_and_log| (when |interaction>batch_mode| and |log_file| is open). @= if interaction=batch_mode then selector:=no_print@+else selector:=term_only @ A global variable |deletions_allowed| is set |false| if the |get_next| routine is active when |error| is called; this ensures that |get_next| and related routines like |get_token| will never be called recursively. A similar interlock is provided by |set_box_allowed|. @^recursion@> The global variable |history| records the worst level of error that has been detected. It has four possible values: |spotless|, |warning_issued|, |error_message_issued|, and |fatal_error_stop|. Another global variable, |error_count|, is increased by one when an |error| occurs without an interactive dialog, and it is reset to zero at the end of every paragraph. If |error_count| reaches 100, \TeX\ decides that there is no point in continuing further. @d spotless=0 {|history| value when nothing has been amiss yet} @d warning_issued=1 {|history| value when |begin_diagnostic| has been called} @d error_message_issued=2 {|history| value when |error| has been called} @d fatal_error_stop=3 {|history| value when termination was premature} @= @!deletions_allowed:boolean; {is it safe for |error| to call |get_token|?} @!set_box_allowed:boolean; {is it safe to do a \.{\\setbox} assignment?} @!history:spotless..fatal_error_stop; {has the source input been clean so far?} @!error_count:-1..100; {the number of scrolled errors since the last paragraph ended} @ The value of |history| is initially |fatal_error_stop|, but it will be changed to |spotless| if \TeX\ survives the initialization process. @= deletions_allowed:=true; set_box_allowed:=true; error_count:=0; {|history| is initialized elsewhere} @ Since errors can be detected almost anywhere in \TeX, we want to declare the error procedures near the beginning of the program. But the error procedures in turn use some other procedures, which need to be declared |forward| before we get to |error| itself. It is possible for |error| to be called recursively if some error arises when |get_token| is being used to delete a token, and/or if some fatal error occurs while \TeX\ is trying to fix a non-fatal one. But such recursion @^recursion@> is never more than two levels deep. @= procedure@?normalize_selector; forward;@t\2@>@/ procedure@?get_token; forward;@t\2@>@/ procedure@?term_input; forward;@t\2@>@/ procedure@?show_context; forward;@t\2@>@/ procedure@?begin_file_reading; forward;@t\2@>@/ procedure@?open_log_file; forward;@t\2@>@/ procedure@?close_files_and_terminate; forward;@t\2@>@/ procedure@?clear_for_error_prompt; forward;@t\2@>@/ procedure@?give_err_help; forward;@t\2@>@/ @t\4\hskip-\fontdimen2\font@>@;@+@!debug@+procedure@?debug_help; forward;@;@+gubed @ Individual lines of help are recorded in the array |help_line|, which contains entries in positions |0..(help_ptr-1)|. They should be printed in reverse order, i.e., with |help_line[0]| appearing last. @d hlp1(#)==help_line[0]:=#;@+end @d hlp2(#)==help_line[1]:=#; hlp1 @d hlp3(#)==help_line[2]:=#; hlp2 @d hlp4(#)==help_line[3]:=#; hlp3 @d hlp5(#)==help_line[4]:=#; hlp4 @d hlp6(#)==help_line[5]:=#; hlp5 @d help0==help_ptr:=0 {sometimes there might be no help} @d help1==@+begin help_ptr:=1; hlp1 {use this with one help line} @d help2==@+begin help_ptr:=2; hlp2 {use this with two help lines} @d help3==@+begin help_ptr:=3; hlp3 {use this with three help lines} @d help4==@+begin help_ptr:=4; hlp4 {use this with four help lines} @d help5==@+begin help_ptr:=5; hlp5 {use this with five help lines} @d help6==@+begin help_ptr:=6; hlp6 {use this with six help lines} @p procedure dohelp5 (@!a,b,c,d,e:str_number); begin help5(a)(b)(c)(d)(e); end; @# procedure dohelp4 (@!a,b,c,d:str_number); begin help4(a)(b)(c)(d); end; @# procedure dohelp3 (@!a,b,c:str_number); begin help3(a)(b)(c); end; @# procedure dohelp2 (@!a,b:str_number); begin help2(a)(b); end; @# procedure dohelp1 (@!a:str_number); begin help1(a); end; @# procedure do_print_err (s:str_number); begin print_err(s); end; @ @= @!help_line:array[0..5] of str_number; {helps for the next |error|} @!help_ptr:0..6; {the number of help lines present} @!use_err_help:boolean; {should the |err_help| list be shown?} @ @= help_ptr:=0; use_err_help:=false; @ The |jump_out| procedure just cuts across all active procedure levels and goes to |end_of_TEX|. This is the only nontrivial |@!goto| statement in the whole program. It is used when there is no recovery from a particular error. Some \PASCAL\ compilers do not implement non-local |goto| statements. @^system dependencies@> In such cases the body of |jump_out| should simply be `|close_files_and_terminate|;\thinspace' followed by a call on some system procedure that quietly terminates the program. @= procedure jump_out; begin goto end_of_TEX; end; @ Here now is the general |error| routine. @= procedure error; {completes the job of error reporting} label continue,exit; var c:ASCII_code; {what the user types} callback_id:integer; @!s1,@!s2,@!s3,@!s4:integer; {used to save global variables when deleting tokens} @!t:boolean; begin if history0 then t := run_callback(callback_id,'->'); show_context; if interaction=error_stop_mode then @; incr(error_count); if error_count=100 then begin print_nl("(That makes 100 errors; please try again.)"); @.That makes 100 errors...@> history:=fatal_error_stop; jump_out; end; @; exit:end; @ @= loop@+begin continue: clear_for_error_prompt; prompt_input("? "); @.?\relax@> if last=first then return; c:=buffer[first]; if c>="a" then c:=c+"A"-"a"; {convert to uppercase} @; end @ It is desirable to provide an `\.E' option here that gives the user an easy way to return from \TeX\ to the system editor, with the offending line ready to be edited. But such an extension requires some system wizardry, so the present implementation simply types out the name of the file that should be edited and the relevant line number. @^system dependencies@> There is a secret `\.D' option available when the debugging routines haven't been commented~out. @^debugging@> @= case c of "0","1","2","3","4","5","6","7","8","9": if deletions_allowed then @; @t\4\4@>@;@+@!debug "D": begin debug_help; goto continue;@+end;@+gubed@/ "E": if base_ptr>0 then begin print_nl("You want to edit file "); @.You want to edit file x@> slow_print(input_stack[base_ptr].name_field); print(" at line "); print_int(line); interaction:=scroll_mode; jump_out; end; "H": @; "I":@; "Q","R","S":@; "X":begin interaction:=scroll_mode; jump_out; end; othercases do_nothing endcases;@/ @ @ @= begin print("Type to proceed, S to scroll future error messages,");@/ @.Type to proceed...@> print_nl("R to run without stopping, Q to run quietly,");@/ print_nl("I to insert something, "); if base_ptr>0 then print("E to edit your file,"); if deletions_allowed then print_nl("1 or ... or 9 to ignore the next 1 to 9 tokens of input,"); print_nl("H for help, X to quit."); end @ Here the author of \TeX\ apologizes for making use of the numerical relation between |"Q"|, |"R"|, |"S"|, and the desired interaction settings |batch_mode|, |nonstop_mode|, |scroll_mode|. @^Knuth, Donald Ervin@> @= begin error_count:=0; interaction:=batch_mode+c-"Q"; print("OK, entering "); case c of "Q":begin print_esc("batchmode"); decr(selector); end; "R":print_esc("nonstopmode"); "S":print_esc("scrollmode"); end; {there are no other cases} print("..."); print_ln; update_terminal; return; end @ When the following code is executed, |buffer[(first+1)..(last-1)]| may contain the material inserted by the user; otherwise another prompt will be given. In order to understand this part of the program fully, you need to be familiar with \TeX's input stacks. @= begin begin_file_reading; {enter a new syntactic level for terminal input} {now |state=mid_line|, so an initial blank space will count as a blank} if last>first+1 then begin loc:=first+1; buffer[first]:=" "; end else begin prompt_input("insert>"); loc:=first; @.insert>@> end; first:=last; cur_input.limit_field:=last-1; {no |end_line_char| ends this line} return; end @ We allow deletion of up to 99 tokens at a time. @= begin s1:=cur_tok; s2:=cur_cmd; s3:=cur_chr; s4:=align_state; align_state:=1000000; OK_to_interrupt:=false; if (last>first+1) and (buffer[first+1]>="0")and(buffer[first+1]<="9") then c:=c*10+buffer[first+1]-"0"*11 else c:=c-"0"; while c>0 do begin get_token; {one-level recursive call of |error| is possible} decr(c); end; cur_tok:=s1; cur_cmd:=s2; cur_chr:=s3; align_state:=s4; OK_to_interrupt:=true; help2("I have just deleted some text, as you asked.")@/ ("You can now delete more, or insert, or whatever."); show_context; goto continue; end @ @= begin if use_err_help then begin give_err_help; use_err_help:=false; end else begin if help_ptr=0 then help2("Sorry, I don't know how to help in this situation.")@/ @t\kern1em@>("Maybe you should try asking a human?"); repeat decr(help_ptr); print(help_line[help_ptr]); print_ln; until help_ptr=0; end; help4("Sorry, I already gave what help I could...")@/ ("Maybe you should try asking a human?")@/ ("An error might have occurred before I noticed any problems.")@/ ("``If all else fails, read the instructions.''");@/ goto continue; end @ @= if interaction>batch_mode then decr(selector); {avoid terminal output} if use_err_help then begin print_ln; give_err_help; end else while help_ptr>0 do begin decr(help_ptr); print_nl(help_line[help_ptr]); end; print_ln; if interaction>batch_mode then incr(selector); {re-enable terminal output} print_ln @ A dozen or so error messages end with a parenthesized integer, so we save a teeny bit of program space by declaring the following procedure: @p procedure int_error(@!n:integer); begin print(" ("); print_int(n); print_char(")"); error; end; @ In anomalous cases, the print selector might be in an unknown state; the following subroutine is called to fix things just enough to keep running a bit longer. @p procedure normalize_selector; begin if log_opened then selector:=term_and_log else selector:=term_only; if job_name=0 then open_log_file; if interaction=batch_mode then decr(selector); end; @ The following procedure prints \TeX's last words before dying. @d succumb==begin if interaction=error_stop_mode then interaction:=scroll_mode; {no more interaction} if log_opened then error; @!debug if interaction>batch_mode then debug_help;@+gubed@;@/ history:=fatal_error_stop; jump_out; {irrecoverable error} end @= procedure fatal_error(@!s:str_number); {prints |s|, and that's it} begin normalize_selector;@/ print_err("Emergency stop"); help1(s); succumb; @.Emergency stop@> end; @ Here is the most dreaded error message. @= procedure lua_norm_error(@!s:str_number); {lua found a problem} begin print_err("LuaTeX error "); print(s); help2("The lua interpreter ran into a problem, so the")@/ ("remainder of this lua chunk will be ignored."); error; end; procedure lua_fatal_error(@!s:str_number); {lua found a problem} begin normalize_selector; print_err("LuaTeX fatal error "); print(s); succumb; end; procedure overflow(@!s:str_number;@!n:integer); {stop due to finiteness} begin normalize_selector; print_err("TeX capacity exceeded, sorry ["); @.TeX capacity exceeded ...@> print(s); print_char("="); print_int(n); print_char("]"); help2("If you really absolutely need more capacity,")@/ ("you can ask a wizard to enlarge me."); succumb; end; procedure overflow_ocp_buf_size; begin overflow("ocp_buf_size",ocp_buf_size); end; procedure overflow_ocp_stack_size; begin overflow("ocp_stack_size",ocp_stack_size); end; @ The program might sometime run completely amok, at which point there is no choice but to stop. If no previous error has been detected, that's bad news; a message is printed that is really intended for the \TeX\ maintenance person instead of the user (unless the user has been particularly diabolical). The index entries for `this can't happen' may help to pinpoint the problem. @^dry rot@> @= procedure confusion(@!s:str_number); {consistency check violated; |s| tells where} begin normalize_selector; if history help1("I'm broken. Please show this to someone who can fix can fix"); end else begin print_err("I can't go on meeting you like this"); @.I can't go on...@> help2("One of your faux pas seems to have wounded me deeply...")@/ ("in fact, I'm barely conscious. Please fix it and try again."); end; succumb; end; @ Users occasionally want to interrupt \TeX\ while it's running. If the \PASCAL\ runtime system allows this, one can implement a routine that sets the global variable |interrupt| to some nonzero value when such an interrupt is signalled. Otherwise there is probably at least a way to make |interrupt| nonzero using the \PASCAL\ debugger. @^system dependencies@> @^debugging@> @= @!interrupt:integer; {should \TeX\ pause for instructions?} @!OK_to_interrupt:boolean; {should interrupts be observed?} @!detokenized_line:boolean; {indicates this is a 'detokenized' input line } @!line_catcode_table:integer; {the used catcode table for input lines} @!local_catcode_table:boolean; {the used catcode table for input lines} @!static_int_base:integer; {C version of |int_base|} @ @p procedure check_interrupt; begin if interrupt<>0 then pause_for_instructions; end; @ @= interrupt:=0; OK_to_interrupt:=true; detokenized_line:=false; line_catcode_table:=0; local_catcode_table:=false; @ When an interrupt has been detected, the program goes into its highest interaction level and lets the user have nearly the full flexibility of the |error| routine. \TeX\ checks for interrupts only at times when it is safe to do this. @p procedure pause_for_instructions; begin if OK_to_interrupt then begin interaction:=error_stop_mode; if (selector=log_only)or(selector=no_print) then incr(selector); print_err("Interruption"); @.Interruption@> help3("You rang?")@/ ("Try to insert some instructions for me (e.g.,`I\showlists'),")@/ ("unless you just want to quit by typing `X'."); deletions_allowed:=false; error; deletions_allowed:=true; interrupt:=0; end; end; @* \[7] Arithmetic with scaled dimensions. The principal computations performed by \TeX\ are done entirely in terms of integers less than $2^{31}$ in magnitude; and divisions are done only when both dividend and divisor are nonnegative. Thus, the arithmetic specified in this program can be carried out in exactly the same way on a wide variety of computers, including some small ones. Why? Because the arithmetic calculations need to be spelled out precisely in order to guarantee that \TeX\ will produce identical output on different machines. If some quantities were rounded differently in different implementations, we would find that line breaks and even page breaks might occur in different places. Hence the arithmetic of \TeX\ has been designed with care, and systems that claim to be implementations of \TeX82 should follow precisely the @:TeX82}{\TeX82@> calculations as they appear in the present program. (Actually there are three places where \TeX\ uses |div| with a possibly negative numerator. These are harmless; see |div| in the index. Also if the user sets the \.{\\time} or the \.{\\year} to a negative value, some diagnostic information will involve negative-numerator division. The same remarks apply for |mod| as well as for |div|.) @ Here is a routine that calculates half of an integer, using an unambiguous convention with respect to signed odd numbers. @p function half(@!x:integer):integer; begin if odd(x) then half:=(x+1) div 2 else half:=x @!div 2; end; @ Fixed-point arithmetic is done on {\sl scaled integers\/} that are multiples of $2^{-16}$. In other words, a binary point is assumed to be sixteen bit positions from the right end of a binary computer word. @d unity == @'200000 {$2^{16}$, represents 1.00000} @d two == @'400000 {$2^{17}$, represents 2.00000} @= @!scaled = integer; {this type is used for scaled integers} @!nonnegative_integer=0..@'17777777777; {$0\L x<2^{31}$} @!small_number=0..63; {this type is self-explanatory} @ The following function is used to create a scaled integer from a given decimal fraction $(.d_0d_1\ldots d_{k-1})$, where |0<=k<=17|. The digit $d_i$ is given in |dig[i]|, and the calculation produces a correctly rounded result. @p function round_decimals(@!k:small_number) : scaled; {converts a decimal fraction} var a:integer; {the accumulator} begin a:=0; while k>0 do begin decr(k); a:=(a+dig[k]*two) div 10; end; round_decimals:=(a+1) div 2; end; @ Conversely, here is a procedure analogous to |print_int|. If the output of this procedure is subsequently read by \TeX\ and converted by the |round_decimals| routine above, it turns out that the original value will be reproduced exactly; the ``simplest'' such decimal number is output, but there is always at least one digit following the decimal point. The invariant relation in the \&{repeat} loop is that a sequence of decimal digits yet to be printed will yield the original number if and only if they form a fraction~$f$ in the range $s-\delta\L10\cdot2^{16}funity then s:=s+@'100000-50000; {round the last digit} print_char("0"+(s div unity)); s:=10*(s mod unity); delta:=delta*10; until s<=delta; end; @ Physical sizes that a \TeX\ user specifies for portions of documents are represented internally as scaled points. Thus, if we define an `sp' (scaled @^sp@> point) as a unit equal to $2^{-16}$ printer's points, every dimension inside of \TeX\ is an integer number of sp. There are exactly 4,736,286.72 sp per inch. Users are not allowed to specify dimensions larger than $2^{30}-1$ sp, which is a distance of about 18.892 feet (5.7583 meters); two such quantities can be added without overflow on a 32-bit computer. The present implementation of \TeX\ does not check for overflow when @^Overflow in arithmetic@> dimensions are added or subtracted. This could be done by inserting a few dozen tests of the form `\ignorespaces|if x>=@'10000000000 then @t\\{report\_overflow}@>|', but the chance of overflow is so remote that such tests do not seem worthwhile. \TeX\ needs to do only a few arithmetic operations on scaled quantities, other than addition and subtraction, and the following subroutines do most of the work. A single computation might use several subroutine calls, and it is desirable to avoid producing multiple error messages in case of arithmetic overflow; so the routines set the global variable |arith_error| to |true| instead of reporting errors directly to the user. Another global variable, |remainder|, holds the remainder after a division. @= @!arith_error:boolean; {has arithmetic overflow occurred recently?} @!remainder:scaled; {amount subtracted to get an exact division} @ The first arithmetical subroutine we need computes $nx+y$, where |x| and~|y| are |scaled| and |n| is an integer. We will also use it to multiply integers. @d nx_plus_y(#)==mult_and_add(#,@'7777777777) @d mult_integers(#)==mult_and_add(#,0,@'17777777777) @p function mult_and_add(@!n:integer;@!x,@!y,@!max_answer:scaled):scaled; begin if n<0 then begin negate(x); negate(n); end; if n=0 then mult_and_add:=y else if ((x<=(max_answer-y) div n)and(-x<=(max_answer+y) div n)) then mult_and_add:=n*x+y else begin arith_error:=true; mult_and_add:=0; end; end; @ We also need to divide scaled dimensions by integers. @p function x_over_n(@!x:scaled;@!n:integer):scaled; var negative:boolean; {should |remainder| be negated?} begin negative:=false; if n=0 then begin arith_error:=true; x_over_n:=0; remainder:=x; end else begin if n<0 then begin negate(x); negate(n); negative:=true; end; if x>=0 then begin x_over_n:=x div n; remainder:=x mod n; end else begin x_over_n:=-((-x) div n); remainder:=-((-x) mod n); end; end; if negative then negate(remainder); end; @ Then comes the multiplication of a scaled number by a fraction |n/d|, where |n| and |d| are nonnegative integers |<=@t$2^{16}$@>| and |d| is positive. It would be too dangerous to multiply by~|n| and then divide by~|d|, in separate operations, since overflow might well occur; and it would be too inaccurate to divide by |d| and then multiply by |n|. Hence this subroutine simulates 1.5-precision arithmetic. @p function xn_over_d(@!x:scaled; @!n,@!d:integer):scaled; var positive:boolean; {was |x>=0|?} @!t,@!u,@!v:nonnegative_integer; {intermediate quantities} begin if x>=0 then positive:=true else begin negate(x); positive:=false; end; t:=(x mod @'100000)*n; u:=(x div @'100000)*n+(t div @'100000); v:=(u mod d)*@'100000 + (t mod @'100000); if u div d>=@'100000 then arith_error:=true else u:=@'100000*(u div d) + (v div d); if positive then begin xn_over_d:=u; remainder:=v mod d; end else begin xn_over_d:=-u; remainder:=-(v mod d); end; end; @ The next subroutine is used to compute the ``badness'' of glue, when a total~|t| is supposed to be made from amounts that sum to~|s|. According to {\sl The \TeX book}, the badness of this situation is $100(t/s)^3$; however, badness is simply a heuristic, so we need not squeeze out the last drop of accuracy when computing it. All we really want is an approximation that has similar properties. @:TeXbook}{\sl The \TeX book@> The actual method used to compute the badness is easier to read from the program than to describe in words. It produces an integer value that is a reasonably close approximation to $100(t/s)^3$, and all implementations of \TeX\ should use precisely this method. Any badness of $2^{13}$ or more is treated as infinitely bad, and represented by 10000. It is not difficult to prove that $$\hbox{|badness(t+1,s)>=badness(t,s) >=badness(t,s+1)|}.$$ The badness function defined here is capable of computing at most 1095 distinct values, but that is plenty. @d inf_bad = 10000 {infinitely bad value} @p function badness(@!t,@!s:scaled):halfword; {compute badness, given |t>=0|} var r:integer; {approximation to $\alpha t/s$, where $\alpha^3\approx 100\cdot2^{18}$} begin if t=0 then badness:=0 else if s<=0 then badness:=inf_bad else begin if t<=7230584 then r:=(t*297) div s {$297^3=99.94\times2^{18}$} else if s>=1663497 then r:=t div (s div 297) else r:=t; if r>1290 then badness:=inf_bad {$1290^3<2^{31}<1291^3$} else badness:=(r*r*r+@'400000) div @'1000000; end; {that was $r^3/2^{18}$, rounded to the nearest integer} end; @ When \TeX\ ``packages'' a list into a box, it needs to calculate the proportionality ratio by which the glue inside the box should stretch or shrink. This calculation does not affect \TeX's decision making, so the precise details of rounding, etc., in the glue calculation are not of critical importance for the consistency of results on different computers. We shall use the type |glue_ratio| for such proportionality ratios. A glue ratio should take the same amount of memory as an |integer| (usually 32 bits) if it is to blend smoothly with \TeX's other data structures. Thus |glue_ratio| should be equivalent to |short_real| in some implementations of \PASCAL. Alternatively, it is possible to deal with glue ratios using nothing but fixed-point arithmetic; see {\sl TUGboat \bf3},1 (March 1982), 10--27. (But the routines cited there must be modified to allow negative glue ratios.) @^system dependencies@> @d set_glue_ratio_zero(#) == #:=0.0 {store the representation of zero ratio} @d set_glue_ratio_one(#) == #:=1.0 {store the representation of unit ratio} @d float(#) == # {convert from |glue_ratio| to type |real|} @d unfloat(#) == # {convert from |real| to type |glue_ratio|} @d float_constant(#) == #.0 {convert |integer| constant to |real|} @= @!glue_ratio=real; {one-word representation of a glue expansion factor} @* \[7b] Random numbers. \font\tenlogo=logo10 % font used for the METAFONT logo \def\MP{{\tenlogo META}\-{\tenlogo POST}} This section is (almost) straight from MetaPost. I had to change the types (use |integer| instead of |fraction|), but that should not have any influence on the actual calculations (the original comments refer to quantities like |fraction_four| ($2^{30}$), and that is the same as the numeric representation of |maxdimen|). I've copied the low-level variables and routines that are needed, but only those (e.g. |m_log|), not the accompanying ones like |m_exp|. Most of the following low-level numeric routines are only needed within the calculation of |norm_rand|. I've been forced to rename |make_fraction| to |make_frac| because TeX already has a routine by that name with a wholly different function (it creates a |fraction_noad| for math typesetting) -- Taco And now let's complete our collection of numeric utility routines by considering random number generation. \MP\ generates pseudo-random numbers with the additive scheme recommended in Section 3.6 of {\sl The Art of Computer Programming}; however, the results are random fractions between 0 and |fraction_one-1|, inclusive. There's an auxiliary array |randoms| that contains 55 pseudo-random fractions. Using the recurrence $x_n=(x_{n-55}-x_{n-31})\bmod 2^{28}$, we generate batches of 55 new $x_n$'s at a time by calling |new_randoms|. The global variable |j_random| tells which element has most recently been consumed. @= @!randoms:array[0..54] of integer; {the last 55 random values generated} @!j_random:0..54; {the number of unused |randoms|} @!random_seed:scaled; {the default random seed} @ A small bit of metafont is needed. @d fraction_half==@'1000000000 {$2^{27}$, represents 0.50000000} @d fraction_one==@'2000000000 {$2^{28}$, represents 1.00000000} @d fraction_four==@'10000000000 {$2^{30}$, represents 4.00000000} @d el_gordo == @'17777777777 {$2^{31}-1$, the largest value that \MP\ likes} @d halfp(#)==(#) div 2 @d double(#) == #:=#+# {multiply a variable by two} @ The |make_frac| routine produces the |fraction| equivalent of |p/q|, given integers |p| and~|q|; it computes the integer $f=\lfloor2^{28}p/q+{1\over2}\rfloor$, when $p$ and $q$ are positive. If |p| and |q| are both of the same scaled type |t|, the ``type relation'' |make_frac(t,t)=fraction| is valid; and it's also possible to use the subroutine ``backwards,'' using the relation |make_frac(t,fraction)=t| between scaled types. If the result would have magnitude $2^{31}$ or more, |make_frac| sets |arith_error:=true|. Most of \MP's internal computations have been designed to avoid this sort of error. If this subroutine were programmed in assembly language on a typical machine, we could simply compute |(@t$2^{28}$@>*p)div q|, since a double-precision product can often be input to a fixed-point division instruction. But when we are restricted to \PASCAL\ arithmetic it is necessary either to resort to multiple-precision maneuvering or to use a simple but slow iteration. The multiple-precision technique would be about three times faster than the code adopted here, but it would be comparatively long and tricky, involving about sixteen additional multiplications and divisions. This operation is part of \MP's ``inner loop''; indeed, it will consume nearly 10\pct! of the running time (exclusive of input and output) if the code below is left unchanged. A machine-dependent recoding will therefore make \MP\ run faster. The present implementation is highly portable, but slow; it avoids multiplication and division except in the initial stage. System wizards should be careful to replace it with a routine that is guaranteed to produce identical results in all cases. @^system dependencies@> As noted below, a few more routines should also be replaced by machine-dependent code, for efficiency. But when a procedure is not part of the ``inner loop,'' such changes aren't advisable; simplicity and robustness are preferable to trickery, unless the cost is too high. @^inner loop@> @p function make_frac(@!p,@!q:integer):integer; var @!f:integer; {the fraction bits, with a leading 1 bit} @!n:integer; {the integer part of $\vert p/q\vert$} @!negative:boolean; {should the result be negated?} @!be_careful:integer; {disables certain compiler optimizations} begin if p>=0 then negative:=false else begin negate(p); negative:=true; end; if q<=0 then begin debug if q=0 then confusion("/");@;@+gubed@;@/ @:this can't happen /}{\quad \./@> negate(q); negative:=not negative; end; n:=p div q; p:=p mod q; if n>=8 then begin arith_error:=true; if negative then make_frac:=-el_gordo@+else make_frac:=el_gordo; end else begin n:=(n-1)*fraction_one; @; if negative then make_frac:=-(f+n)@+else make_frac:=f+n; end; end; @ The |repeat| loop here preserves the following invariant relations between |f|, |p|, and~|q|: (i)~|0<=p @= f:=1; repeat be_careful:=p-q; p:=be_careful+p; if p>=0 then f:=f+f+1 else begin double(f); p:=p+q; end; until f>=fraction_one; be_careful:=p-q; if be_careful+p>=0 then incr(f) @ @p function take_frac(@!q:integer;@!f:integer):integer; var @!p:integer; {the fraction so far} @!negative:boolean; {should the result be negated?} @!n:integer; {additional multiple of $q$} @!be_careful:integer; {disables certain compiler optimizations} begin @=0| and |q>0|@>; if f; be_careful:=n-el_gordo; if be_careful+p>0 then begin arith_error:=true; n:=el_gordo-p; end; if negative then take_frac:=-(n+p) else take_frac:=n+p; end; @ @=0| and |q>0|@>= if f>=0 then negative:=false else begin negate(f); negative:=true; end; if q<0 then begin negate(q); negative:=not negative; end; @ The invariant relations in this case are (i)~$\lfloor(qf+p)/2^k\rfloor =\lfloor qf_0/2^{28}+{1\over2}\rfloor$, where $k$ is an integer and $f_0$ is the original value of~$f$; (ii)~$2^k\L f<2^{k+1}$. @^inner loop@> @= p:=fraction_half; {that's $2^{27}$; the invariants hold now with $k=28$} if q= @!two_to_the:array[0..30] of integer; {powers of two} @!spec_log:array[1..28] of integer; {special logarithms} @ @= two_to_the[0]:=1; for k:=1 to 30 do two_to_the[k]:=2*two_to_the[k-1]; spec_log[1]:=93032640; spec_log[2]:=38612034; spec_log[3]:=17922280; spec_log[4]:=8662214; spec_log[5]:=4261238; spec_log[6]:=2113709; spec_log[7]:=1052693; spec_log[8]:=525315; spec_log[9]:=262400; spec_log[10]:=131136; spec_log[11]:=65552; spec_log[12]:=32772; spec_log[13]:=16385; for k:=14 to 27 do spec_log[k]:=two_to_the[27-k]; spec_log[28]:=1; @ @p function m_log(@!x:integer):integer; var @!y,@!z:integer; {auxiliary registers} @!k:integer; {iteration counter} begin if x<=0 then @ else begin y:=1302456956+4-100; {$14\times2^{27}\ln2\approx1302456956.421063$} z:=27595+6553600; {and $2^{16}\times .421063\approx 27595$} while xfraction_four+4 do @; m_log:=y div 8; end; end; @ @= begin z:=((x-1) div two_to_the[k])+1; {$z=\lceil x/2^k\rceil$} while x= begin print_err("Logarithm of "); @.Logarithm...replaced by 0@> print_scaled(x); print(" has been replaced by 0"); help2("Since I don't take logs of non-positive numbers,")@/ ("I'm zeroing this one. Proceed, with fingers crossed."); error; m_log:=0; end @ The following somewhat different subroutine tests rigorously if $ab$ is greater than, equal to, or less than~$cd$, given integers $(a,b,c,d)$. In most cases a quick decision is reached. The result is $+1$, 0, or~$-1$ in the three respective cases. @d return_sign(#)==begin ab_vs_cd:=#; return; end @p function ab_vs_cd(@!a,b,c,d:integer):integer; label exit; var @!q,@!r:integer; {temporary registers} begin @=0|, |b,d>0|@>; loop@+ begin q := a div d; r := c div b; if q<>r then if q>r then return_sign(1)@+else return_sign(-1); q := a mod d; r := c mod b; if r=0 then if q=0 then return_sign(0)@+else return_sign(1); if q=0 then return_sign(-1); a:=b; b:=q; c:=d; d:=r; end; {now |a>d>0| and |c>b>0|} exit:end; @ @= if a<0 then begin negate(a); negate(b); end; if c<0 then begin negate(c); negate(d); end; if d<=0 then begin if b>=0 then if ((a=0)or(b=0))and((c=0)or(d=0)) then return_sign(0) else return_sign(1); if d=0 then if a=0 then return_sign(0)@+else return_sign(-1); q:=a; a:=c; c:=q; q:=-b; b:=-d; d:=q; end else if b<=0 then begin if b<0 then if a>0 then return_sign(-1); if c=0 then return_sign(0) else return_sign(-1); end @ To consume a random integer, the program below will say `|next_random|' and then it will fetch |randoms[j_random]|. @d next_random==if j_random=0 then new_randoms else decr(j_random) @p procedure new_randoms; var @!k:0..54; {index into |randoms|} @!x:integer; {accumulator} begin for k:=0 to 23 do begin x:=randoms[k]-randoms[k+31]; if x<0 then x:=x+fraction_one; randoms[k]:=x; end; for k:=24 to 54 do begin x:=randoms[k]-randoms[k-24]; if x<0 then x:=x+fraction_one; randoms[k]:=x; end; j_random:=54; end; @ To initialize the |randoms| table, we call the following routine. @p procedure init_randoms(@!seed:integer); var @!j,@!jj,@!k:integer; {more or less random integers} @!i:0..54; {index into |randoms|} begin j:=abs(seed); while j>=fraction_one do j:=halfp(j); k:=1; for i:=0 to 54 do begin jj:=k; k:=j-k; j:=jj; if k<0 then k:=k+fraction_one; randoms[(i*21)mod 55]:=j; end; new_randoms; new_randoms; new_randoms; {``warm up'' the array} end; @ To produce a uniform random number in the range |0<=u=u>x| or |0=u=x|, given a |scaled| value~|x|, we proceed as shown here. Note that the call of |take_frac| will produce the values 0 and~|x| with about half the probability that it will produce any other particular values between 0 and~|x|, because it rounds its answers. @p function unif_rand(@!x:integer):integer; var @!y:integer; {trial value} begin next_random; y:=take_frac(abs(x),randoms[j_random]); if y=abs(x) then unif_rand:=0 else if x>0 then unif_rand:=y else unif_rand:=-y; end; @ Finally, a normal deviate with mean zero and unit standard deviation can readily be obtained with the ratio method (Algorithm 3.4.1R in {\sl The Art of Computer Programming\/}). @p function norm_rand:integer; var @!x,@!u,@!l:integer; {what the book would call $2^{16}X$, $2^{28}U$, and $-2^{24}\ln U$} begin repeat repeat next_random; x:=take_frac(112429,randoms[j_random]-fraction_half); {$2^{16}\sqrt{8/e}\approx 112428.82793$} next_random; u:=randoms[j_random]; until abs(x)=0; norm_rand:=x; end; @* \[8] Packed data. In order to make efficient use of storage space, \TeX\ bases its major data structures on a |memory_word|, which contains either a (signed) integer, possibly scaled, or a (signed) |glue_ratio|, or a small number of fields that are one half or one quarter of the size used for storing integers. If |x| is a variable of type |memory_word|, it contains up to four fields that can be referred to as follows: $$\vbox{\halign{\hfil#&#\hfil&#\hfil\cr |x|&.|int|&(an |integer|)\cr |x|&.|sc|\qquad&(a |scaled| integer)\cr |x|&.|gr|&(a |glue_ratio|)\cr |x.hh.lh|, |x.hh|&.|rh|&(two halfword fields)\cr |x.hh.b0|, |x.hh.b1|, |x.hh|&.|rh|&(two quarterword fields, one halfword field)\cr |x.qqqq.b0|, |x.qqqq.b1|, |x.qqqq|&.|b2|, |x.qqqq.b3|\hskip-100pt &\qquad\qquad\qquad(four quarterword fields)\cr}}$$ This is somewhat cumbersome to write, and not very readable either, but macros will be used to make the notation shorter and more transparent. The \PASCAL\ code below gives a formal definition of |memory_word| and its subsidiary types, using packed variant records. \TeX\ makes no assumptions about the relative positions of the fields within a word. We are assuming 32-bit integers, a halfword must contain at least 32 bits, and a quarterword must contain at least 16 bits. @^system dependencies@> N.B.: Valuable memory space will be dreadfully wasted unless \TeX\ is compiled by a \PASCAL\ that packs all of the |memory_word| variants into the space of a single integer. This means, for example, that |glue_ratio| words should be |short_real| instead of |real| on some computers. Some \PASCAL\ compilers will pack an integer whose subrange is `|0..255|' into an eight-bit field, but others insist on allocating space for an additional sign bit; on such systems you can get 256 values into a quarterword only if the subrange is `|-128..127|'. The present implementation tries to accommodate as many variations as possible, so it makes few assumptions. If integers having the subrange `|min_quarterword..max_quarterword|' can be packed into a quarterword, and if integers having the subrange `|min_halfword..max_halfword|' can be packed into a halfword, everything should work satisfactorily. It is usually most efficient to have |min_quarterword=min_halfword=0|, so one should try to achieve this unless it causes a severe problem. The values defined here are recommended for most 32-bit computers. We cannot use the full range of 32 bits in a halfword, because we have to allow negative values for potential backend tricks like web2c's dynamic allocation, a