% $Id: mp.web,v 1.8 2005/08/24 10:54:02 taco Exp $ % MetaPost, by John Hobby. Public domain. % Much of this program was copied with permission from MF.web Version 1.9 % It interprets a language very similar to D.E. Knuth's METAFONT, but with % changes designed to make it more suitable for PostScript output. % TeX is a trademark of the American Mathematical Society. % METAFONT is a trademark of Addison-Wesley Publishing Company. % PostScript is a trademark of Adobe Systems Incorporated. % Here is TeX material that gets inserted after \input webmac \def\hang{\hangindent 3em\noindent\ignorespaces} \def\textindent#1{\hangindent2.5em\noindent\hbox to2.5em{\hss#1 }\ignorespaces} \def\PASCAL{Pascal} \def\ps{PostScript} \def\ph{\hbox{Pascal-H}} \def\psqrt#1{\sqrt{\mathstrut#1}} \def\k{_{k+1}} \def\pct!{{\char`\%}} % percent sign in ordinary text \font\tenlogo=logo10 % font used for the METAFONT logo \font\logos=logosl10 \def\MF{{\tenlogo META}\-{\tenlogo FONT}} \def\MP{{\tenlogo META}\-{\tenlogo POST}} \def\<#1>{$\langle#1\rangle$} \def\section{\mathhexbox278} \let\swap=\leftrightarrow \def\round{\mathop{\rm round}\nolimits} \mathchardef\vb="026A % synonym for `\|' \def\(#1){} % this is used to make section names sort themselves better \def\9#1{} % this is used for sort keys in the index via @@:sort key}{entry@@> \outer\def\N#1. \[#2]#3.{\MN#1.\vfil\eject % begin starred section \def\rhead{PART #2:\uppercase{#3}} % define running headline \message{*\modno} % progress report \edef\next{\write\cont{\Z{\?#2]#3}{\modno}{\the\pageno}}}\next \ifon\startsection{\bf\ignorespaces#3.\quad}\ignorespaces} \let\?=\relax % we want to be able to \write a \? \def\title{MetaPost} \def\topofcontents{\hsize 5.5in \vglue -30pt plus 1fil minus 1.5in \def\?##1]{\hbox to 1in{\hfil##1.\ }} } \def\botofcontents{\vskip 0pt plus 1fil minus 1.5in} \pageno=3 \def\glob{13} % this should be the section number of "" \def\gglob{20, 26} % this should be the next two sections of "" @* \[1] Introduction. This is \MP, a graphics-language processor based on D. E. Knuth's \MF. The \PASCAL\ program that follows defines a standard version @:PASCAL}{\PASCAL@> of \MP\ that is designed to be highly portable so that identical output will be obtainable on a great variety of computers. The main purpose of the following program is to explain the algorithms of \MP\ as clearly as possible. As a result, the program will not necessarily be very efficient when a particular \PASCAL\ compiler has translated it into a particular machine language. However, the program has been written so that it can be tuned to run efficiently in a wide variety of operating environments by making comparatively few changes. Such flexibility is possible because the documentation that follows is written in the \.{WEB} language, which is at a higher level than \PASCAL; the preprocessing step that converts \.{WEB} to \PASCAL\ is able to introduce most of the necessary refinements. Semi-automatic translation to other languages is also feasible, because the program below does not make extensive use of features that are peculiar to \PASCAL. A large piece of software like \MP\ has inherent complexity that cannot be reduced below a certain level of difficulty, although each individual part is fairly simple by itself. The \.{WEB} language is intended to make the algorithms as readable as possible, by reflecting the way the individual program pieces fit together and by providing the cross-references that connect different parts. Detailed comments about what is going on, and about why things were done in certain ways, have been liberally sprinkled throughout the program. These comments explain features of the implementation, but they rarely attempt to explain the \MP\ language itself, since the reader is supposed to be familiar with {\sl The {\logos METAFONT\/}book} as well as the manual @.WEB@> @:METAFONTbook}{\sl The {\logos METAFONT\/}book@> {\sl A User's Manual for MetaPost}, Computing Science Technical Report 162, AT\AM T Bell Laboratories. @ The present implementation is a preliminary version, but the possibilities for new features are limited by the desire to remain as nearly compatible with \MF\ as possible. On the other hand, the \.{WEB} description can be extended without changing the core of the program, and it has been designed so that such extensions are not extremely difficult to make. The |banner| string defined here should be changed whenever \MP\ undergoes any modifications, so that it will be clear which version of \MP\ might be the guilty party when a problem arises. @^extensions to \MP@> @^system dependencies@> @d banner=='This is MetaPost, Version 1.005' {printed when \MP\ starts} @d metapost_version=="1.005" @ Different \PASCAL s have slightly different conventions, and the present @!@:PASCAL H}{\ph@> program is expressed in a version of \PASCAL\ that D. E. Knuth used for \MF. Constructions that apply to this particular compiler, which we shall call \ph, should help the reader see how to make an appropriate interface for other systems if necessary. (\ph\ is Charles Hedrick's modification of a compiler @^Hedrick, Charles Locke@> for the DECsystem-10 that was originally developed at the University of Hamburg; cf.\ {\sl SOFTWARE---Practice \AM\ Experience \bf6} (1976), 29--42. The \MP\ program below is intended to be adaptable, without extensive changes, to most other versions of \PASCAL\ and commonly used \PASCAL-to-C translators, so it does not fully @!@:C@> use the admirable features of \ph. Indeed, a conscious effort has been made here to avoid using several idiosyncratic features of standard \PASCAL\ itself, so that most of the code can be translated mechanically into other high-level languages. For example, the `\&{with}' and `\\{new}' features are not used, nor are pointer types, set types, or enumerated scalar types; there are no `\&{var}' parameters, except in the case of files; there are no tag fields on variant records; there are no |real| variables; no procedures are declared local to other procedures.) The portions of this program that involve system-dependent code, where changes might be necessary because of differences between \PASCAL\ compilers and/or differences between operating systems, can be identified by looking at the sections whose numbers are listed under `system dependencies' in the index. Furthermore, the index entries for `dirty \PASCAL' list all places where the restrictions of \PASCAL\ have not been followed perfectly, for one reason or another. @!@^system dependencies@> @!@^dirty \PASCAL@> @ The program begins with a normal \PASCAL\ program heading, whose components will be filled in later, using the conventions of \.{WEB}. @.WEB@> For example, the portion of the program called `\X\glob:Global variables\X' below will be replaced by a sequence of variable declarations that starts in $\section\glob$ of this documentation. In this way, we are able to define each individual global variable when we are prepared to understand what it means; we do not have to define all of the globals at once. Cross references in $\section\glob$, where it says ``See also sections \gglob, \dots,'' also make it possible to look at the set of all global variables, if desired. Similar remarks apply to the other portions of the program heading. Actually the heading shown here is not quite normal: The |program| line does not mention any |output| file, because \ph\ would ask the \MP\ user to specify a file name if |output| were specified here. @:PASCAL H}{\ph@> @^system dependencies@> @d mtype==t@&y@&p@&e {this is a \.{WEB} coding trick:} @f mtype==type {`\&{mtype}' will be equivalent to `\&{type}'} @f type==true {but `|type|' will not be treated as a reserved word} @p @t\4@>@@/ program MP; {all file names are defined dynamically} label @@/ const @@/ mtype @@/ var @@/ @# procedure initialize; {this procedure gets things started properly} var @@/ begin @@/ end;@# @t\4@>@@/ @t\4@>@@/ @ The overall \MP\ program begins with the heading just shown, after which comes a bunch of procedure declarations and function declarations. Finally we will get to the main program, which begins with the comment `|start_here|'. If you want to skip down to the main program now, you can look up `|start_here|' in the index. But the author suggests that the best way to understand this program is to follow pretty much the order of \MP's components as they appear in the \.{WEB} description you are now reading, since the present ordering is intended to combine the advantages of the ``bottom up'' and ``top down'' approaches to the problem of understanding a somewhat complicated system. @ Three labels must be declared in the main program, so we give them symbolic names. @d start_of_MP=1 {go here when \MP's variables are initialized} @d end_of_MP=9998 {go here to close files and terminate gracefully} @d final_end=9999 {this label marks the ending of the program} @= start_of_MP@t\hskip-2pt@>, end_of_MP@t\hskip-2pt@>,@,final_end; {key control points} @ Some of the code below is intended to be used only when diagnosing the strange behavior that sometimes occurs when \MP\ is being installed or when system wizards are fooling around with \MP\ without quite knowing what they are doing. Such code will not normally be compiled; it is delimited by the codewords `$|debug|\ldots|gubed|$', with apologies to people who wish to preserve the purity of English. Similarly, there is some conditional code delimited by `$|stat|\ldots|tats|$' that is intended for use when statistics are to be kept about \MP's memory usage. @^debugging@> @d debug==@{ {change this to `$\\{debug}\equiv\null$' when debugging} @d gubed==@t@>@} {change this to `$\\{gubed}\equiv\null$' when debugging} @f debug==begin @f gubed==end @# @d stat==@{ {change this to `$\\{stat}\equiv\null$' when gathering usage statistics} @d tats==@t@>@} {change this to `$\\{tats}\equiv\null$' when gathering usage statistics} @f stat==begin @f tats==end @ This program has two important variations: (1) There is a long and slow version called \.{INIMP}, which does the extra calculations needed to @.INIMP@> initialize \MP's internal tables; and (2)~there is a shorter and faster production version, which cuts the initialization to a bare minimum. Parts of the program that are needed in (1) but not in (2) are delimited by the codewords `$|init|\ldots|tini|$'. @d init== {change this to `$\\{init}\equiv\.{@@\{}$' in the production version} @d tini== {change this to `$\\{tini}\equiv\.{@@\}}$' in the production version} @f init==begin @f tini==end @ If the first character of a \PASCAL\ comment is a dollar sign, \ph\ treats the comment as a list of ``compiler directives'' that will affect the translation of this program into machine language. The directives shown below specify full checking and inclusion of the \PASCAL\ debugger when \MP\ is being debugged, but they cause range checking and other redundant code to be eliminated when the production system is being generated. Arithmetic overflow will be detected in all cases. @:PASCAL H}{\ph@> @^system dependencies@> @^Overflow in arithmetic@> @= @{@&$C-,A+,D-@} {no range check, catch arithmetic overflow, no debug overhead} @!debug @{@&$C+,D+@}@+ gubed {but turn everything on when debugging} @ This \MP\ implementation conforms to the rules of the {\sl Pascal User @:PASCAL}{\PASCAL@> @^system dependencies@> Manual} published by Jensen and Wirth in 1975, except where system-dependent @^Wirth, Niklaus@> @^Jensen, Kathleen@> code is necessary to make a useful system program, and except in another respect where such conformity would unnecessarily obscure the meaning and clutter up the code: We assume that |case| statements may include a default case that applies if no matching label is found. Thus, we shall use constructions like $$\vbox{\halign{\ignorespaces#\hfil\cr |case x of|\cr 1: $\langle\,$code for $x=1\,\rangle$;\cr 3: $\langle\,$code for $x=3\,\rangle$;\cr |othercases| $\langle\,$code for |x<>1| and |x<>3|$\,\rangle$\cr |endcases|\cr}}$$ since most \PASCAL\ compilers have plugged this hole in the language by incorporating some sort of default mechanism. For example, the \ph\ compiler allows `|others|:' as a default label, and other \PASCAL s allow syntaxes like `\&{else}' or `\&{otherwise}' or `\\{otherwise}:', etc. The definitions of |othercases| and |endcases| should be changed to agree with local conventions. Note that no semicolon appears before |endcases| in this program, so the definition of |endcases| should include a semicolon if the compiler wants one. (Of course, if no default mechanism is available, the |case| statements of \MP\ will have to be laboriously extended by listing all remaining cases. People who are stuck with such \PASCAL s have, in fact, done this, successfully but not happily!) @:PASCAL H}{\ph@> @d othercases == others: {default for cases not listed explicitly} @d endcases == @+end {follows the default case in an extended |case| statement} @f othercases == else @f endcases == end @ The following parameters can be changed at compile time to extend or reduce \MP's capacity. They may have different values in \.{INIMP} and in production versions of \MP. @.INIMP@> @^system dependencies@> @= @!mem_max=30000; {greatest index in \MP's internal |mem| array; must be strictly less than |max_halfword|; must be equal to |mem_top| in \.{INIMP}, otherwise |>=mem_top|} @!max_internal=100; {maximum number of internal quantities} @!buf_size=500; {maximum number of characters simultaneously present in current lines of open files; must not exceed |max_halfword|} @!error_line=72; {width of context lines on terminal error messages} @!half_error_line=42; {width of first lines of contexts in terminal error messages; should be between 30 and |error_line-15|} @!max_print_line=79; {width of longest text lines output; should be at least 60} @!emergency_line_length=255; {\ps\ output lines can be this long in unusual circumstances} @!stack_size=30; {maximum number of simultaneous input sources} @!max_read_files=4; {maximum number of simultaneously open \&{readfrom} files} @!max_strings=2500; {maximum number of strings; must not exceed |max_halfword|} @!string_vacancies=9000; {the minimum number of characters that should be available for the user's identifier names and strings, after \MP's own error messages are stored} @!strings_vacant=1000; {the minimum number of strings that should be available} @!pool_size=32000; {maximum number of characters in strings, including all error messages and help texts, and the names of all identifiers; must exceed |string_vacancies| by the total length of \MP's own strings, which is currently about 22000} @!font_max=50; {maximum font number for included text fonts} @!font_mem_size=10000; {number of words for \.{TFM} information for text fonts} @!file_name_size=40; {file names shouldn't be longer than this} @!pool_name='MPlib:MP.POOL '; {string of length |file_name_size|; tells where the string pool appears} @.MPlib@> @!ps_tab_name='MPlib:PSFONTS.MAP '; {string of length |file_name_size|; locates font name translation table} @!path_size=300; {maximum number of knots between breakpoints of a path} @!bistack_size=785; {size of stack for bisection algorithms; should probably be left at this value} @!header_size=100; {maximum number of \.{TFM} header words, times~4} @!lig_table_size=5000; {maximum number of ligature/kern steps, must be at least 255 and at most 32510} @!max_kerns=500; {maximum number of distinct kern amounts} @!max_font_dimen=50; {maximum number of \&{fontdimen} parameters} @ Like the preceding parameters, the following quantities can be changed at compile time to extend or reduce \MP's capacity. But if they are changed, it is necessary to rerun the initialization program \.{INIMP} @.INIMP@> to generate new tables for the production \MP\ program. One can't simply make helter-skelter changes to the following constants, since certain rather complex initialization numbers are computed from them. They are defined here using \.{WEB} macros, instead of being put into \PASCAL's |const| list, in order to emphasize this distinction. @d mem_min=0 {smallest index in the |mem| array, must not be less than |min_halfword|} @d mem_top==30000 {largest index in the |mem| array dumped by \.{INIMP}; must be substantially larger than |mem_min| and not greater than |mem_max|} @d hash_size=2100 {maximum number of symbolic tokens, must be less than |max_halfword-3*param_size|} @d hash_prime=1777 {a prime number equal to about 85\pct! of |hash_size|} @d max_in_open=6 {maximum number of input files and error insertions that can be going on simultaneously} @d param_size=150 {maximum number of simultaneous macro parameters} @d max_write_files=4 {maximum number of simultaneously open \&{write} files} @^system dependencies@> @ In case somebody has inadvertently made bad settings of the ``constants,'' \MP\ checks them using a global variable called |bad|. This is the first of many sections of \MP\ where global variables are defined. @= @!bad:integer; {is some ``constant'' wrong?} @ Later on we will say `\ignorespaces|if mem_max>=max_halfword then bad:=10|', or something similar. (We can't do that until |max_halfword| has been defined.) @= bad:=0; if (half_error_line<30)or(half_error_line>error_line-15) then bad:=1; if max_print_line<60 then bad:=2; if emergency_line_lengthmem_top then bad:=4; if hash_prime>hash_size then bad:=5; if header_size mod 4 <> 0 then bad:=6; if(lig_table_size<255)or(lig_table_size>32510)then bad:=7; @ Labels are given symbolic names by the following definitions, so that occasional |goto| statements will be meaningful. We insert the label `|exit|:' just before the `\ignorespaces|end|\unskip' of a procedure in which we have used the `|return|' statement defined below; the label `|restart|' is occasionally used at the very beginning of a procedure; and the label `|reswitch|' is occasionally used just prior to a |case| statement in which some cases change the conditions and we wish to branch to the newly applicable case. Loops that are set up with the |loop| construction defined below are commonly exited by going to `|done|' or to `|found|' or to `|not_found|', and they are sometimes repeated by going to `|continue|'. If two or more parts of a subroutine start differently but end up the same, the shared code may be gathered together at `|common_ending|'. Incidentally, this program never declares a label that isn't actually used, because some fussy \PASCAL\ compilers will complain about redundant labels. @d exit=10 {go here to leave a procedure} @d restart=20 {go here to start a procedure again} @d reswitch=21 {go here to start a case statement again} @d continue=22 {go here to resume a loop} @d done=30 {go here to exit a loop} @d done1=31 {like |done|, when there is more than one loop} @d done2=32 {for exiting the second loop in a long block} @d done3=33 {for exiting the third loop in a very long block} @d done4=34 {for exiting the fourth loop in an extremely long block} @d done5=35 {for exiting the fifth loop in an immense block} @d done6=36 {for exiting the sixth loop in a block} @d found=40 {go here when you've found it} @d found1=41 {like |found|, when there's more than one per routine} @d found2=42 {like |found|, when there's more than two per routine} @d found3=43 {like |found|, when there's more than three per routine} @d not_found=45 {go here when you've found nothing} @d common_ending=50 {go here when you want to merge with another branch} @ Here are some macros for common programming idioms. @d incr(#) == #:=#+1 {increase a variable by unity} @d decr(#) == #:=#-1 {decrease a variable by unity} @d negate(#) == #:=-# {change the sign of a variable} @d double(#) == #:=#+# {multiply a variable by two} @d loop == @+ while true do@+ {repeat over and over until a |goto| happens} @f loop == xclause {\.{WEB}'s |xclause| acts like `\ignorespaces|while true do|\unskip'} @d do_nothing == {empty statement} @d return == goto exit {terminate a procedure call} @f return == nil {\.{WEB} will henceforth say |return| instead of \\{return}} @* \[2] The character set. In order to make \MP\ readily portable to a wide variety of computers, all of its input text is converted to an internal eight-bit code that includes standard ASCII, the ``American Standard Code for Information Interchange.'' This conversion is done immediately when each character is read in. Conversely, characters are converted from ASCII to the user's external representation just before they are output to a text file. @^ASCII code@> Such an internal code is relevant to users of \MP\ only with respect to the \&{char} and \&{ASCII} operations, and the comparison of strings. @ Characters of text that have been converted to \MP's internal form are said to be of type |ASCII_code|, which is a subrange of the integers. @= @!ASCII_code=0..255; {eight-bit numbers} @ The original \PASCAL\ compiler was designed in the late 60s, when six-bit character sets were common, so it did not make provision for lowercase letters. Nowadays, of course, we need to deal with both capital and small letters in a convenient way, especially in a program for font design; so the present specification of \MP\ has been written under the assumption that the \PASCAL\ compiler and run-time system permit the use of text files with more than 64 distinguishable characters. More precisely, we assume that the character set contains at least the letters and symbols associated with ASCII codes @'40 through @'176; all of these characters are now available on most computer terminals. Since we are dealing with more characters than were present in the first \PASCAL\ compilers, we have to decide what to call the associated data type. Some \PASCAL s use the original name |char| for the characters in text files, even though there now are more than 64 such characters, while other \PASCAL s consider |char| to be a 64-element subrange of a larger data type that has some other name. In order to accommodate this difference, we shall use the name |text_char| to stand for the data type of the characters that are converted to and from |ASCII_code| when they are input and output. We shall also assume that |text_char| consists of the elements |chr(first_text_char)| through |chr(last_text_char)|, inclusive. The following definitions should be adjusted if necessary. @^system dependencies@> @d text_char == char {the data type of characters in text files} @d first_text_char=0 {ordinal number of the smallest element of |text_char|} @d last_text_char=255 {ordinal number of the largest element of |text_char|} @= @!i:integer; @ The \MP\ processor converts between ASCII code and the user's external character set by means of arrays |xord| and |xchr| that are analogous to \PASCAL's |ord| and |chr| functions. @= @!xord: array [text_char] of ASCII_code; {specifies conversion of input characters} @!xchr: array [ASCII_code] of text_char; {specifies conversion of output characters} @ Since we are assuming that our \PASCAL\ system is able to read and write the visible characters of standard ASCII (although not necessarily using the ASCII codes to represent them), the following assignment statements initialize the standard part of the |xchr| array properly, without needing any system-dependent changes. On the other hand, it is possible to implement \MP\ with less complete character sets, and in such cases it will be necessary to change something here. @^system dependencies@> @= xchr[@'40]:=' '; xchr[@'41]:='!'; xchr[@'42]:='"'; xchr[@'43]:='#'; xchr[@'44]:='$'; xchr[@'45]:='%'; xchr[@'46]:='&'; xchr[@'47]:='''';@/ xchr[@'50]:='('; xchr[@'51]:=')'; xchr[@'52]:='*'; xchr[@'53]:='+'; xchr[@'54]:=','; xchr[@'55]:='-'; xchr[@'56]:='.'; xchr[@'57]:='/';@/ xchr[@'60]:='0'; xchr[@'61]:='1'; xchr[@'62]:='2'; xchr[@'63]:='3'; xchr[@'64]:='4'; xchr[@'65]:='5'; xchr[@'66]:='6'; xchr[@'67]:='7';@/ xchr[@'70]:='8'; xchr[@'71]:='9'; xchr[@'72]:=':'; xchr[@'73]:=';'; xchr[@'74]:='<'; xchr[@'75]:='='; xchr[@'76]:='>'; xchr[@'77]:='?';@/ xchr[@'100]:='@@'; xchr[@'101]:='A'; xchr[@'102]:='B'; xchr[@'103]:='C'; xchr[@'104]:='D'; xchr[@'105]:='E'; xchr[@'106]:='F'; xchr[@'107]:='G';@/ xchr[@'110]:='H'; xchr[@'111]:='I'; xchr[@'112]:='J'; xchr[@'113]:='K'; xchr[@'114]:='L'; xchr[@'115]:='M'; xchr[@'116]:='N'; xchr[@'117]:='O';@/ xchr[@'120]:='P'; xchr[@'121]:='Q'; xchr[@'122]:='R'; xchr[@'123]:='S'; xchr[@'124]:='T'; xchr[@'125]:='U'; xchr[@'126]:='V'; xchr[@'127]:='W';@/ xchr[@'130]:='X'; xchr[@'131]:='Y'; xchr[@'132]:='Z'; xchr[@'133]:='['; xchr[@'134]:='\'; xchr[@'135]:=']'; xchr[@'136]:='^'; xchr[@'137]:='_';@/ xchr[@'140]:='`'; xchr[@'141]:='a'; xchr[@'142]:='b'; xchr[@'143]:='c'; xchr[@'144]:='d'; xchr[@'145]:='e'; xchr[@'146]:='f'; xchr[@'147]:='g';@/ xchr[@'150]:='h'; xchr[@'151]:='i'; xchr[@'152]:='j'; xchr[@'153]:='k'; xchr[@'154]:='l'; xchr[@'155]:='m'; xchr[@'156]:='n'; xchr[@'157]:='o';@/ xchr[@'160]:='p'; xchr[@'161]:='q'; xchr[@'162]:='r'; xchr[@'163]:='s'; xchr[@'164]:='t'; xchr[@'165]:='u'; xchr[@'166]:='v'; xchr[@'167]:='w';@/ xchr[@'170]:='x'; xchr[@'171]:='y'; xchr[@'172]:='z'; xchr[@'173]:='{'; xchr[@'174]:='|'; xchr[@'175]:='}'; xchr[@'176]:='~';@/ @ The ASCII code is ``standard'' only to a certain extent, since many computer installations have found it advantageous to have ready access to more than 94 printing characters. If \MP\ is being used on a garden-variety \PASCAL\ for which only standard ASCII codes will appear in the input and output files, it doesn't really matter what codes are specified in |xchr[0..@'37]|, but the safest policy is to blank everything out by using the code shown below. However, other settings of |xchr| will make \MP\ more friendly on computers that have an extended character set, so that users can type things like `\.^^Z' instead of `\.{<>}'. People with extended character sets can assign codes arbitrarily, giving an |xchr| equivalent to whatever characters the users of \MP\ are allowed to have in their input files. Appropriate changes to \MP's |char_class| table should then be made. (Unlike \TeX, each installation of \MP\ has a fixed assignment of category codes, called the |char_class|.) Such changes make portability of programs more difficult, so they should be introduced cautiously if at all. @^character set dependencies@> @^system dependencies@> @= for i:=0 to @'37 do xchr[i]:=' '; for i:=@'177 to @'377 do xchr[i]:=' '; @ The following system-independent code makes the |xord| array contain a suitable inverse to the information in |xchr|. Note that if |xchr[i]=xchr[j]| where |i= for i:=first_text_char to last_text_char do xord[chr(i)]:=@'177; for i:=@'200 to @'377 do xord[xchr[i]]:=i; for i:=0 to @'176 do xord[xchr[i]]:=i; @* \[3] Input and output. The bane of portability is the fact that different operating systems treat input and output quite differently, perhaps because computer scientists have not given sufficient attention to this problem. People have felt somehow that input and output are not part of ``real'' programming. Well, it is true that some kinds of programming are more fun than others. With existing input/output conventions being so diverse and so messy, the only sources of joy in such parts of the code are the rare occasions when one can find a way to make the program a little less bad than it might have been. We have two choices, either to attack I/O now and get it over with, or to postpone I/O until near the end. Neither prospect is very attractive, so let's get it over with. The basic operations we need to do are (1)~inputting and outputting of text, to or from a file or the user's terminal; (2)~inputting and outputting of eight-bit bytes, to or from a file; (3)~instructing the operating system to initiate (``open'') or to terminate (``close'') input or output from a specified file; (4)~testing whether the end of an input file has been reached; (5)~display of bits on the user's screen. The bit-display operation will be discussed in a later section; we shall deal here only with more traditional kinds of I/O. \MP\ needs to deal with two kinds of files. We shall use the term |alpha_file| for a file that contains textual data, and the term |byte_file| for a file that contains eight-bit binary information. These two types turn out to be the same on many computers, but sometimes there is a significant distinction, so we shall be careful to distinguish between them. Standard protocols for transferring such files from computer to computer, via high-speed networks, are now becoming available to more and more communities of users. The program actually makes use also of a third kind of file, called a |word_file|, when dumping and reloading mem information for its own initialization. We shall define a word file later; but it will be possible for us to specify simple operations on word files before they are defined. @= @!eight_bits=0..255; {unsigned one-byte quantity} @!alpha_file=packed file of text_char; {files that contain textual data} @!byte_file=packed file of eight_bits; {files that contain binary data} @ Most of what we need to do with respect to input and output can be handled by the I/O facilities that are standard in \PASCAL, i.e., the routines called |get|, |put|, |eof|, and so on. But standard \PASCAL\ does not allow file variables to be associated with file names that are determined at run time, so it cannot be used to implement \MP; some sort of extension to \PASCAL's ordinary |reset| and |rewrite| is crucial for our purposes. We shall assume that |name_of_file| is a variable of an appropriate type such that the \PASCAL\ run-time system being used to implement \MP\ can open a file whose external name is specified by |name_of_file|. @^system dependencies@> @= @!name_of_file:packed array[1..file_name_size] of char;@;@/ {on some systems this may be a \&{record} variable} @!name_length:0..file_name_size;@/{this many characters are actually relevant in |name_of_file| (the rest are blank)} @ The \ph\ compiler with which the original version of \MF\ was prepared extends the rules of \PASCAL\ in a very convenient way. To open file~|f|, we can write $$\vbox{\halign{#\hfil\qquad&#\hfil\cr |reset(f,@t\\{name}@>,'/O')|&for input;\cr |rewrite(f,@t\\{name}@>,'/O')|&for output.\cr}}$$ The `\\{name}' parameter, which is of type `\ignorespaces|packed array[@t\<\\{any}>@>] of text_char|', stands for the name of the external file that is being opened for input or output. Blank spaces that might appear in \\{name} are ignored. The `\.{/O}' parameter tells the operating system not to issue its own error messages if something goes wrong. If a file of the specified name cannot be found, or if such a file cannot be opened for some other reason (e.g., someone may already be trying to write the same file), we will have |@!erstat(f)<>0| after an unsuccessful |reset| or |rewrite|. This allows \MP\ to undertake appropriate corrective action. @:PASCAL H}{\ph@> @^system dependencies@> \MP's file-opening procedures return |false| if no file identified by |name_of_file| could be opened. @d reset_OK(#)==erstat(#)=0 @d rewrite_OK(#)==erstat(#)=0 @p function a_open_in(var @!f:alpha_file):boolean; {open a text file for input} begin reset(f,name_of_file,'/O'); a_open_in:=reset_OK(f); end; @# function a_open_out(var @!f:alpha_file):boolean; {open a text file for output} begin rewrite(f,name_of_file,'/O'); a_open_out:=rewrite_OK(f); end; @# function b_open_in(var @!f:byte_file):boolean; {open a binary file for input} begin reset(f,name_of_file,'/O'); b_open_in:=reset_OK(f); end; @# function b_open_out(var @!f:byte_file):boolean; {open a binary file for output} begin rewrite(f,name_of_file,'/O'); b_open_out:=rewrite_OK(f); end; @# function w_open_in(var @!f:word_file):boolean; {open a word file for input} begin reset(f,name_of_file,'/O'); w_open_in:=reset_OK(f); end; @# function w_open_out(var @!f:word_file):boolean; {open a word file for output} begin rewrite(f,name_of_file,'/O'); w_open_out:=rewrite_OK(f); end; @ Files can be closed with the \ph\ routine `|close(f)|', which @:PASCAL H}{\ph@> @^system dependencies@> should be used when all input or output with respect to |f| has been completed. This makes |f| available to be opened again, if desired; and if |f| was used for output, the |close| operation makes the corresponding external file appear on the user's area, ready to be read. @p procedure a_close(var @!f:alpha_file); {close a text file} begin close(f); end; @# procedure b_close(var @!f:byte_file); {close a binary file} begin close(f); end; @# procedure w_close(var @!f:word_file); {close a word file} begin close(f); end; @ Binary input and output are done with \PASCAL's ordinary |get| and |put| procedures, so we don't have to make any other special arrangements for binary~I/O. Text output is also easy to do with standard \PASCAL\ routines. The treatment of text input is more difficult, however, because of the necessary translation to |ASCII_code| values. \MP's conventions should be efficient, and they should blend nicely with the user's operating environment. @ Input from text files is read one line at a time, using a routine called |input_ln|. This function is defined in terms of global variables called |buffer|, |first|, and |last| that will be described in detail later; for now, it suffices for us to know that |buffer| is an array of |ASCII_code| values, and that |first| and |last| are indices into this array representing the beginning and ending of a line of text. @= @!buffer:array[0..buf_size] of ASCII_code; {lines of characters being read} @!first:0..buf_size; {the first unused position in |buffer|} @!last:0..buf_size; {end of the line just input to |buffer|} @!max_buf_stack:0..buf_size; {largest index used in |buffer|} @ The |input_ln| function brings the next line of input from the specified field into available positions of the buffer array and returns the value |true|, unless the file has already been entirely read, in which case it returns |false| and sets |last:=first|. In general, the |ASCII_code| numbers that represent the next line of the file are input into |buffer[first]|, |buffer[first+1]|, \dots, |buffer[last-1]|; and the global variable |last| is set equal to |first| plus the length of the line. Trailing blanks are removed from the line; thus, either |last=first| (in which case the line was entirely blank) or |buffer[last-1]<>" "|. @^inner loop@> An overflow error is given, however, if the normal actions of |input_ln| would make |last>=buf_size|; this is done so that other parts of \MP\ can safely look at the contents of |buffer[last+1]| without overstepping the bounds of the |buffer| array. Upon entry to |input_ln|, the condition |first=max_buf_stack then begin max_buf_stack:=last+1; if max_buf_stack=buf_size then @; end; buffer[last]:=xord[f^]; get(f); incr(last); if buffer[last-1]<>" " then last_nonblank:=last; end; last:=last_nonblank; input_ln:=true; end; end; @ The user's terminal acts essentially like other files of text, except that it is used both for input and for output. When the terminal is considered an input file, the file variable is called |term_in|, and when it is considered an output file the file variable is |term_out|. @^system dependencies@> @= @!term_in:alpha_file; {the terminal as an input file} @!term_out:alpha_file; {the terminal as an output file} @ Here is how to open the terminal files in \ph. The `\.{/I}' switch suppresses the first |get|. @:PASCAL H}{\ph@> @^system dependencies@> @d t_open_in==reset(term_in,'TTY:','/O/I') {open the terminal for text input} @d t_open_out==rewrite(term_out,'TTY:','/O') {open the terminal for text output} @ Sometimes it is necessary to synchronize the input/output mixture that happens on the user's terminal, and three system-dependent procedures are used for this purpose. The first of these, |update_terminal|, is called when we want to make sure that everything we have output to the terminal so far has actually left the computer's internal buffers and been sent. The second, |clear_terminal|, is called when we wish to cancel any input that the user may have typed ahead (since we are about to issue an unexpected error message). The third, |wake_up_terminal|, is supposed to revive the terminal if the user has disabled it by some instruction to the operating system. The following macros show how these operations can be specified in \ph: @:PASCAL H}{\ph@> @^system dependencies@> @d update_terminal == break(term_out) {empty the terminal output buffer} @d clear_terminal == break_in(term_in,true) {clear the terminal input buffer} @d wake_up_terminal == do_nothing {cancel the user's cancellation of output} @ We need a special routine to read the first line of \MP\ input from the user's terminal. This line is different because it is read before we have opened the transcript file; there is sort of a ``chicken and egg'' problem here. If the user types `\.{input cmr10}' on the first line, or if some macro invoked by that line does such an \.{input}, the transcript file will be named `\.{cmr10.log}'; but if no \.{input} commands are performed during the first line of terminal input, the transcript file will acquire its default name `\.{mpout.log}'. (The transcript file will not contain error messages generated by the first line before the first \.{input} command.) The first line is even more special if we are lucky enough to have an operating system that treats \MP\ differently from a run-of-the-mill \PASCAL\ object program. It's nice to let the user start running a \MP\ job by typing a command line like `\.{MP cmr10}'; in such a case, \MP\ will operate as if the first line of input were `\.{cmr10}', i.e., the first line will consist of the remainder of the command line, after the part that invoked \MP. The first line is special also because it may be read before \MP\ has input a mem file. In such cases, normal error messages cannot yet be given. The following code uses concepts that will be explained later. @= if mem_ident=0 then begin write_ln(term_out,'Buffer size exceeded!'); goto final_end; @.Buffer size exceeded@> end else begin cur_input.loc_field:=first; cur_input.limit_field:=last-1; overflow("buffer size",buf_size); @:MetaPost capacity exceeded buffer size}{\quad buffer size@> end @ Different systems have different ways to get started. But regardless of what conventions are adopted, the routine that initializes the terminal should satisfy the following specifications: \yskip\textindent{1)}It should open file |term_in| for input from the terminal. (The file |term_out| will already be open for output to the terminal.) \textindent{2)}If the user has given a command line, this line should be considered the first line of terminal input. Otherwise the user should be prompted with `\.{**}', and the first line of input should be whatever is typed in response. \textindent{3)}The first line of input, which might or might not be a command line, should appear in locations |first| to |last-1| of the |buffer| array. \textindent{4)}The global variable |loc| should be set so that the character to be read next by \MP\ is in |buffer[loc]|. This character should not be blank, and we should have |loc @p function init_terminal:boolean; {gets the terminal input started} label exit; begin t_open_in; loop@+begin wake_up_terminal; write(term_out,'**'); update_terminal; @.**@> if not input_ln(term_in,true) then {this shouldn't happen} begin write_ln(term_out); write(term_out,'! End of file on the terminal... why?'); @.End of file on the terminal@> init_terminal:=false; return; end; loc:=first; while (loc which converts single-character strings into the ASCII code number of the single character involved, while it converts other strings into integers and builds a string pool file. Thus, when the string constant \.{"."} appears in the program below, \.{WEB} converts it into the integer 46, which is the ASCII code for a period, while \.{WEB} will convert a string like \.{"hello"} into some integer greater than~255. String number 46 will presumably be the single character `\..'\thinspace; but some ASCII codes have no standard visible representation, and \MP\ may need to be able to print an arbitrary ASCII character, so the first 256 strings are used to specify exactly what should be printed for each of the 256 possibilities. Elements of the |str_pool| array must be ASCII codes that can actually be printed; i.e., they must have an |xchr| equivalent in the local character set. (This restriction applies only to preloaded strings, not to those generated dynamically by the user.) Some \PASCAL\ compilers won't pack integers into a single byte unless the integers lie in the range |-128..127|. To accommodate such systems we access the string pool via macros that can easily be redefined. When accessing character dimensions for the \&{infont} operator, an explicit offset is used to convert from |pool_ASCII_code| to |ASCII_code|. @^system dependencies@> @d si(#) == # {convert from |ASCII_code| to |pool_ASCII_code|} @d so(#) == # {convert from |pool_ASCII_code| to |ASCII_code|} @d min_pool_ASCII=0 {added to an |ASCII_code| to make a |pool_ASCII_code|} @= @!pool_pointer = 0..pool_size; {for variables that point into |str_pool|} @!str_number = 0..max_strings; {for variables that point into |str_start|} @!pool_ASCII_code = 0..255; {elements of |str_pool| array} @ @= @!str_pool:packed array[pool_pointer] of pool_ASCII_code; {the characters} @!str_start : array[str_number] of pool_pointer; {the starting pointers} @!next_str : array[str_number] of str_number; {for linking strings in order} @!pool_ptr : pool_pointer; {first unused position in |str_pool|} @!str_ptr : str_number; {number of the current string being created} @!init_pool_ptr : pool_pointer; {the starting value of |pool_ptr|} @!init_str_use : str_number; {the initial number of strings in use} @!max_pool_ptr : pool_pointer; {the maximum so far of |pool_ptr|} @!max_str_ptr : str_number; {the maximum so far of |str_ptr|} @ Except for |strs_used_up|, the following string statistics are only maintained when code between |stat| $\ldots$ |tats| delimiters is not commented out: @= @!strs_used_up:integer; {strings in use or unused but not reclaimed} @!pool_in_use:integer; {total number of cells of |str_pool| actually in use} @!strs_in_use:integer; {total number of strings actually in use} @!max_pl_used:integer; {maximum |pool_in_use| so far} @!max_strs_used:integer; {maximum |strs_in_use| so far} @ Several of the elementary string operations are performed using \.{WEB} macros instead of \PASCAL\ procedures, because many of the operations are done quite frequently and we want to avoid the overhead of procedure calls. For example, here is a simple macro that computes the length of a string. @.WEB@> @d str_stop(#)==str_start[next_str[#]] {one cell past the end of string number \#} @d length(#)==(str_stop(#)-str_start[#]) {the number of characters in string \#} @ The length of the current string is called |cur_length|. If we decide that the current string is not needed, |flush_cur_string| resets |pool_ptr| so that |cur_length| becomes zero. @d cur_length == (pool_ptr - str_start[str_ptr]) @d flush_cur_string == pool_ptr:=str_start[str_ptr] @ Strings are created by appending character codes to |str_pool|. The |append_char| macro, defined here, does not check to see if the value of |pool_ptr| has gotten too high; this test is supposed to be made before |append_char| is used. To test if there is room to append |l| more characters to |str_pool|, we shall write |str_room(l)|, which tries to make sure there is enough room by compacting the string pool if necessary. If this does not work, |do_compaction| aborts \MP\ and gives an apologetic error message. @d append_char(#) == {put |ASCII_code| \# at the end of |str_pool|} begin str_pool[pool_ptr]:=si(#); incr(pool_ptr); end @d str_room(#) == {make sure that the pool hasn't overflowed} begin if pool_ptr+# > max_pool_ptr then if pool_ptr+# > pool_size then do_compaction(#) else max_pool_ptr:=pool_ptr+#; end @ The following routine is similar to |str_room(1)| but it uses the argument |pool_size| to prevent |do_compaction| from aborting when string space is exhausted. @= procedure unit_str_room; begin if pool_ptr>=pool_size then do_compaction(pool_size); if pool_ptr>=max_pool_ptr then max_pool_ptr:=pool_ptr+1; end; @ \MP's string expressions are implemented in a brute-force way: Every new string or substring that is needed is simply copied into the string pool. Space is eventually reclaimed by a procedure called |do_compaction| with the aid of a simple system system of reference counts. @^reference counts@> The number of references to string number |s| will be |str_ref[s]|. The special value |str_ref[s]=max_str_ref=127| is used to denote an unknown positive number of references; such strings will never be recycled. If a string is ever referred to more than 126 times, simultaneously, we put it in this category. Hence a single byte suffices to store each |str_ref|. @d max_str_ref=127 {``infinite'' number of references} @d add_str_ref(#)==begin if str_ref[#]= @!str_ref:array[str_number] of 0..max_str_ref; @ Here's what we do when a string reference disappears: @d delete_str_ref(#)== begin if str_ref[#]1 then decr(str_ref[#])@+else flush_string(#); end @= procedure flush_string(@!s:str_number); begin stat pool_in_use:=pool_in_use-length(s); decr(strs_in_use); tats@; if next_str[s]<>str_ptr then str_ref[s]:=0 else begin str_ptr:=s; decr(strs_used_up); end; pool_ptr:=str_start[str_ptr]; end; @ Once a sequence of characters has been appended to |str_pool|, it officially becomes a string when the function |make_string| is called. This function returns the identification number of the new string as its value. When getting the next unused string number from the linked list, we pretend that $$ \hbox{|max_str_ptr+1|, |max_str_ptr+2|, $\ldots$, |max_strings|} $$ are linked sequentially even though the |next_str| entries have not been initialized yet. We never allow |str_ptr| to reach |max_strings|; |do_compaction| is responsible for making sure of this. @p @t\4@>@@; @t\4@>@@; function make_string : str_number; {current string enters the pool} label restart; var @!s:str_number; {the new string} begin restart: s:=str_ptr; str_ptr:=next_str[s]; if str_ptr>max_str_ptr then if str_ptr=max_strings then begin str_ptr:=s; do_compaction(0); goto restart; end else begin debug if strs_used_up<>max_str_ptr then confusion("s");@+gubed@/ @:this can't happen s}{\quad \.s@> max_str_ptr:=str_ptr; next_str[str_ptr]:=max_str_ptr+1; end; str_ref[s]:=1; str_start[str_ptr]:=pool_ptr; incr(strs_used_up); stat incr(strs_in_use); pool_in_use:=pool_in_use+length(s); if pool_in_use>max_pl_used then max_pl_used:=pool_in_use; if strs_in_use>max_strs_used then max_strs_used:=strs_in_use; tats@; make_string:=s; end; @ On rare occasions, we might decide after calling |make_string| that some characters should be removed from the end of the last string and transferred to the beginning of a string under construction. This basically a matter of resetting |str_start[str_ptr]|. It is not practical to ensure that the new value for this pointer is in range, so this procedure should be used carefully. @p procedure chop_last_string(@!p:pool_pointer); begin stat pool_in_use:=pool_in_use-(str_start[str_ptr]-p); @+tats; str_start[str_ptr]:=p; end; @ The most interesting string operation is string pool compaction. The idea is to recover unused space in the |str_pool| array by recopying the strings to close the gaps created when some strings become unused. All string numbers~$k$ where |str_ref[k]=0| are to be linked into the list of free string numbers after |str_ptr|. If this fails to free enough pool space we issue an |overflow| error unless |needed=pool_size|. Calling |do_compaction| with |needed=pool_size| supresses all overflow tests. The compaction process starts with |last_fixed_str| because all lower numbered strings are permanently allocated with |max_str_ref| in their |str_ref| entries. @= @!last_fixed_str:str_number; {last permanently allocated string} @!fixed_str_use:str_number; {number of permanently allocated strings} @ @= procedure do_compaction(@!needed:pool_pointer); label done; var @!str_use:str_number; {a count of strings in use} @!r,@!s,@!t:str_number; {strings being manipulated} @!p,@!q:pool_pointer; {destination and source for copying string characters} begin @; r:=last_fixed_str; s:=next_str[r]; p:=str_start[s]; while s<>str_ptr do begin while str_ref[s]=0 do @; r:=s; s:=next_str[s]; incr(str_use); @; end; done: @; if needed; stat @; tats@; strs_used_up:=str_use; end; @ @= t:=next_str[last_fixed_str]; while (str_ref[t]=max_str_ref)and(t<>str_ptr) do begin incr(fixed_str_use); last_fixed_str:=t; t:=next_str[t]; end; str_use:=fixed_str_use @ Because of the way |flush_string| has been written, it should never be necessary to |goto done| here. The extra line of code seems worthwhile to preserve the generality of |do_compaction|. @= begin t:=s; s:=next_str[s]; next_str[r]:=s; next_str[t]:=next_str[str_ptr]; next_str[str_ptr]:=t; if s=str_ptr then goto done; end @ The string currently starts at |str_start[r]| and ends just before |str_start[s]|. We don't change |str_start[s]| because it might be needed to locate the next string. @= q:=str_start[r]; str_start[r]:=p; while q= q:=str_start[str_ptr]; str_start[str_ptr]:=p; while q= begin if str_use>=max_strings-1 then begin str_overflowed:=true; overflow("number of strings", max_strings-1-init_str_use); @:MetaPost capacity exceeded number of strings}{\quad number of strings@> end; if pool_ptr+needed>max_pool_ptr then if pool_ptr+needed>pool_size then begin str_overflowed:=true; overflow("pool size", pool_size-init_pool_ptr); @:MetaPost capacity exceeded pool size}{\quad pool size@> end else max_pool_ptr:=pool_ptr+needed; end @ Routines that can be called after string overflow need a way of checking whether it is safe to use |str_room|, |make_string|, or |do_compaction|. @= @!str_overflowed:boolean; {is \MP\ aborting due to pool size of number of strings?} @ @= if (str_start[str_ptr]<>pool_in_use)or(str_use<>strs_in_use) then confusion("string"); @:this can't happen string}{\quad string@> incr(pact_count); pact_chars:=pact_chars+pool_ptr-str_stop(last_fixed_str); pact_strs:=pact_strs+str_use-fixed_str_use; debug s:=str_ptr; t:=str_use; while s<=max_str_ptr do begin if t>max_str_ptr then confusion(""""); incr(t); s:=next_str[s]; end; if t<=max_str_ptr then confusion(""""); gubed @ A few more global variables are needed to keep track of statistics when |stat| $\ldots$ |tats| blocks are not commented out. @= @!pact_count:integer; {number of string pool compactions so far} @!pact_chars:integer; {total number of characters moved during compactions} @!pact_strs:integer; {total number of strings moved during compactions} @ @= pact_count:=0; pact_chars:=0; pact_strs:=0@; @ The following subroutine compares string |s| with another string of the same length that appears in |buffer| starting at position |k|; the result is |true| if and only if the strings are equal. @p function str_eq_buf(@!s:str_number;@!k:integer):boolean; {test equality of strings} label not_found; {loop exit} var @!j: pool_pointer; {running index} @!result: boolean; {result of comparison} begin j:=str_start[s]; while jbuffer[k] then begin result:=false; goto not_found; end; incr(j); incr(k); end; result:=true; not_found: str_eq_buf:=result; end; @ Here is a similar routine, but it compares two strings in the string pool, and it does not assume that they have the same length. If the first string is lexicographically greater than, less than, or equal to the second, the result is respectively positive, negative, or zero. @p function str_vs_str(@!s,@!t:str_number):integer; {test equality of strings} label exit; var @!j,@!k: pool_pointer; {running indices} @!ls,@!lt:integer; {lengths} @!l:integer; {length remaining to test} begin ls:=length(s); lt:=length(t); if ls<=lt then l:=ls@+else l:=lt; j:=str_start[s]; k:=str_start[t]; while l>0 do begin if str_pool[j]<>str_pool[k] then begin str_vs_str:=str_pool[j]-str_pool[k]; return; end; incr(j); incr(k); decr(l); end; str_vs_str:=ls-lt; exit:end; @ The initial values of |str_pool|, |str_start|, |pool_ptr|, and |str_ptr| are computed by the \.{INIMP} program, based in part on the information that \.{WEB} has output while processing \MP. @.INIMP@> @^string pool@> @p @!init function get_strings_started:boolean; {initializes the string pool, but returns |false| if something goes wrong} label done,exit; var @!k:0..255; {small indices or counters} @!g:str_number; {garbage} begin pool_ptr:=0; str_ptr:=0; max_pool_ptr:=0; max_str_ptr:=0; str_start[0]:=0; next_str[0]:=1; str_overflowed:=false; stat pool_in_use:=0; strs_in_use:=0; max_pl_used:=0; max_strs_used:=0; @; tats@; strs_used_up:=0; @; @; last_fixed_str:=str_ptr-1; fixed_str_use:=str_ptr; exit:end; tini @ The first 256 strings will consist of a single character only. @= for k:=0 to 255 do begin append_char(k); g:=make_string; str_ref[g]:=max_str_ref; end; @ The first 128 strings will contain 95 standard ASCII characters, and the other 33 characters will be printed in three-symbol form like `\.{\^\^A}' unless a system-dependent change is made here. Installations that have an extended character set, where for example |xchr[@'32]=@t\.{\'^^Z\'}@>|, would like string @'32 to be printed as the single character @'32 instead of the three characters @'136, @'136, @'132 (\.{\^\^Z}). On the other hand, even people with an extended character set will want to represent string @'15 by \.{\^\^M}, since @'15 is ASCII's ``carriage return'' code; the idea is to produce visible strings instead of tabs or line-feeds or carriage-returns or bell-rings or characters that are treated anomalously in text files. Unprintable characters of codes 128--255 are, similarly, rendered \.{\^\^80}--\.{\^\^ff}. The boolean expression defined here should be |true| unless \MP\ internal code number~|k| corresponds to a non-troublesome visible symbol in the local character set. If character |k| cannot be printed, and |k<@'200|, then character |k+@'100| or |k-@'100| must be printable; moreover, ASCII codes |[@'60..@'71, @'141..@'146]| must be printable. @^character set dependencies@> @^system dependencies@> @= (k<" ")or(k>"~") @ When the \.{WEB} system program called \.{TANGLE} processes the \.{MP.WEB} description that you are now reading, it outputs the \PASCAL\ program \.{MP.PAS} and also a string pool file called \.{MP.POOL}. The \.{INIMP} @.WEB@>@.INIMP@> program reads the latter file, where each string appears as a two-digit decimal length followed by the string itself, and the information is recorded in \MP's string memory. @= @!init @!pool_file:alpha_file; {the string-pool file output by \.{TANGLE}} tini @ @= g := loadpoolstrings((pool_size-string_vacancies)); if g=0 then begin wake_up_terminal; write_ln(term_out,'! You have to increase POOLSIZE.'); get_strings_started:=false; return; end; get_strings_started:=true; @ The \.{WEB} operation \.{@@\$} denotes the value that should be at the end of this \.{MP.POOL} file; any other value means that the wrong pool file has been loaded. @^check sum@> @* \[5] On-line and off-line printing. Messages that are sent to a user's terminal and to the transcript-log file are produced by several `|print|' procedures. These procedures will direct their output to a variety of places, based on the setting of the global variable |selector|, which has the following possible values: \yskip \hang |term_and_log|, the normal setting, prints on the terminal and on the transcript file. \hang |log_only|, prints only on the transcript file. \hang |term_only|, prints only on the terminal. \hang |no_print|, doesn't print at all. This is used only in rare cases before the transcript file is open. \hang |ps_file_only| prints only on the \ps\ output file. \hang |pseudo|, puts output into a cyclic buffer that is used by the |show_context| routine; when we get to that routine we shall discuss the reasoning behind this curious mode. \hang |new_string|, appends the output to the current string in the string pool. \hang |0..max_write_files-1| prints on one of the files used for the \&{write} @:write_}{\&{write} primitive@> command. \yskip \noindent The symbolic names `|term_and_log|', etc., have been assigned numeric codes that satisfy the convenient relations |no_print+1=term_only|, |no_print+2=log_only|, |term_only+2=log_only+1=term_and_log|. These relations are not used when |selector| could be |pseudo|, |new_string|, or |ps_file_only|. We need not check for unprintable characters when |selector= @!log_file : alpha_file; {transcript of \MP\ session} @!ps_file: alpha_file; {the generic font output goes here} @!selector : 0..max_selector; {where to print a message} @!dig : array[0..22] of 0..15; {digits in a number being output} @!tally : integer; {the number of characters recently printed} @!term_offset : 0..max_print_line; {the number of characters on the current terminal line} @!file_offset : 0..max_print_line; {the number of characters on the current file line} @!ps_offset : integer; {the number of characters on the current \ps\ file line} @!trick_buf:array[0..error_line] of ASCII_code; {circular buffer for pseudoprinting} @!trick_count: integer; {threshold for pseudoprinting, explained later} @!first_count: integer; {another variable for pseudoprinting} @ @= selector:=term_only; tally:=0; term_offset:=0; file_offset:=0; ps_offset:=0; @ Macro abbreviations for output to the terminal and to the log file are defined here for convenience. Some systems need special conventions for terminal output, and it is possible to adhere to those conventions by changing |wterm|, |wterm_ln|, and |wterm_cr| here. @^system dependencies@> @d wterm(#)==write(term_out,#) @d wterm_ln(#)==write_ln(term_out,#) @d wterm_cr==write_ln(term_out) @d wlog(#)==write(log_file,#) @d wlog_ln(#)==write_ln(log_file,#) @d wlog_cr==write_ln(log_file) @d wps(#)==write(ps_file,#) @d wps_ln(#)==write_ln(ps_file,#) @d wps_cr==write_ln(ps_file) @ To end a line of text output, we call |print_ln|. Cases |0..max_write_files| use an array |wr_file| that will be declared later. @= procedure print_ln; {prints an end-of-line} begin case selector of term_and_log: begin wterm_cr; wlog_cr; term_offset:=0; file_offset:=0; end; log_only: begin wlog_cr; file_offset:=0; end; term_only: begin wterm_cr; term_offset:=0; end; ps_file_only: begin wps_cr; ps_offset:=0; end; no_print,pseudo,new_string: do_nothing; othercases write_ln(wr_file[selector]) endcases; end; {note that |tally| is not affected} @ The |print_visible_char| procedure sends one character to the desired destination, using the |xchr| array to map it into an external character compatible with |input_ln|. (It assumes that it is always called with a visible ASCII character.) All printing comes through |print_ln| or |print_char|, which ultimately calls |print_visible_char|, hence these routines are the ones that limit lines to at most |max_print_line| characters. But we must make an exception for the \ps\ output file since it is not safe to cut up lines arbitrarily in \ps. Procedure |unit_str_room| needs to be declared |forward| here because it calls |do_compaction| and |do_compaction| can call the error routines. Actually, |unit_str_room| avoids |overflow| errors but it can call |confusion|. @= procedure@?unit_str_room; forward;@t\2@>@/ procedure print_visible_char(@!s:ASCII_code); {prints a single character} label done; begin case selector of term_and_log: begin wterm(xchr[s]); wlog(xchr[s]); incr(term_offset); incr(file_offset); if term_offset=max_print_line then begin wterm_cr; term_offset:=0; end; if file_offset=max_print_line then begin wlog_cr; file_offset:=0; end; end; log_only: begin wlog(xchr[s]); incr(file_offset); if file_offset=max_print_line then print_ln; end; term_only: begin wterm(xchr[s]); incr(term_offset); if term_offset=max_print_line then print_ln; end; ps_file_only: if s=13 then begin wps_cr; ps_offset:=0; end else begin wps(xchr[s]); incr(ps_offset); end; no_print: do_nothing; pseudo: if tally=max_pool_ptr then begin unit_str_room; if pool_ptr>=pool_size then goto done; {drop characters if string space is full} end; append_char(s); end; othercases write(wr_file[selector],xchr[s]) endcases; done:incr(tally); end; @ The |print_char| procedure sends one character to the desired destination. File names and string expressions might contain |ASCII_code| values that can't be printed using |print_visible_char|. These characters will be printed in three- or four-symbol form like `\.{\^\^A}' or `\.{\^\^e4}'. (This procedure assumes that it is safe to bypass all checks for unprintable characters when |selector| is in the range |0..max_write_files-1| or when |selector=ps_file_only|. In the former case the user might want to write unprintable characters, and in the latter case the \ps\ printing routines check their arguments themselves before calling |print_char| or |print|.) @d print_lc_hex(#)==l:=#; if l<10 then print_visible_char(l+"0")@+else print_visible_char(l-10+"a") @= procedure print_char(@!k:ASCII_code); {prints a single character} var l:0..255; {small index or counter} begin if selector then begin print_visible_char("^"); print_visible_char("^"); if k<@'100 then print_visible_char(k+@'100) else if k<@'200 then print_visible_char(k-@'100) else begin print_lc_hex(k div 16); print_lc_hex(k mod 16); end; end else print_visible_char(k); end; @ An entire string is output by calling |print|. Note that if we are outputting the single standard ASCII character \.c, we could call |print("c")|, since |"c"=99| is the number of a single-character string, as explained above. But |print_char("c")| is quicker, so \MP\ goes directly to the |print_char| routine when it knows that this is safe. (The present implementation assumes that it is always safe to print a visible ASCII character.) @^system dependencies@> @= procedure print(@!s:integer); {prints string |s|} var @!j:pool_pointer; {current character code position} begin if (s<0)or(s>max_str_ptr) then s:="???"; {this can't happen} @.???@> j:=str_start[s]; while j= update_terminal; @ The procedure |print_nl| is like |print|, but it makes sure that the string appears at the beginning of a new line. @= procedure print_nl(@!s:str_number); {prints string |s| at beginning of line} begin case selector of term_and_log: if (term_offset>0)or(file_offset>0) then print_ln; log_only: if file_offset>0 then print_ln; term_only: if term_offset>0 then print_ln; ps_file_only: if ps_offset>0 then print_ln; no_print,pseudo,new_string: do_nothing; end; {there are no other cases} print(s); end; @ An array of digits in the range |0..9| is printed by |print_the_digs|. @= procedure print_the_digs(@!k:eight_bits); {prints |dig[k-1]|$\,\ldots\,$|dig[0]|} begin while k>0 do begin decr(k); print_char("0"+dig[k]); end; end; @ The following procedure, which prints out the decimal representation of a given integer |n|, has been written carefully so that it works properly if |n=0| or if |(-n)| would cause overflow. It does not apply |mod| or |div| to negative arguments, since such operations are not implemented consistently by all \PASCAL\ compilers. @= procedure print_int(@!n:integer); {prints an integer in decimal form} var k:0..23; {index to current digit; we assume that $|n|<10^{23}$} @!m:integer; {used to negate |n| in possibly dangerous cases} begin k:=0; if n<0 then begin print_char("-"); if n>-100000000 then negate(n) else begin m:=-1-n; n:=m div 10; m:=(m mod 10)+1; k:=1; if m<10 then dig[0]:=m else begin dig[0]:=0; incr(n); end; end; end; repeat dig[k]:=n mod 10; n:=n div 10; incr(k); until n=0; print_the_digs(k); end; @ \MP\ also makes use of a trivial procedure to print two digits. The following subroutine is usually called with a parameter in the range |0<=n<=99|. @p procedure print_dd(@!n:integer); {prints two least significant digits} begin n:=abs(n) mod 100; print_char("0"+(n div 10)); print_char("0"+(n mod 10)); end; @ Here is a procedure that asks the user to type a line of input, assuming that the |selector| setting is either |term_only| or |term_and_log|. The input is placed into locations |first| through |last-1| of the |buffer| array, and echoed on the transcript file if appropriate. This procedure is never called when |interaction term_offset:=0; {the user's line ended with \<\rm return>} decr(selector); {prepare to echo the input} if last<>first then for k:=first to last-1 do print(buffer[k]); print_ln; buffer[last]:="%"; incr(selector); {restore previous status} end; @* \[6] Reporting errors. When something anomalous is detected, \MP\ typically does something like this: $$\vbox{\halign{#\hfil\cr |print_err("Something anomalous has been detected");|\cr |help3("This is the first line of my offer to help.")|\cr |("This is the second line. I'm trying to")|\cr |("explain the best way for you to proceed.");|\cr |error;|\cr}}$$ A two-line help message would be given using |help2|, etc.; these informal helps should use simple vocabulary that complements the words used in the official error message that was printed. (Outside the U.S.A., the help messages should preferably be translated into the local vernacular. Each line of help is at most 60 characters long, in the present implementation, so that |max_print_line| will not be exceeded.) The |print_err| procedure supplies a `\.!' before the official message, and makes sure that the terminal is awake if a stop is going to occur. The |error| procedure supplies a `\..' after the official message, then it shows the location of the error; and if |interaction=error_stop_mode|, it also enters into a dialog with the user, during which time the help message may be printed. @^system dependencies@> @ The global variable |interaction| has four settings, representing increasing amounts of user interaction: @d batch_mode=0 {omits all stops and omits terminal output} @d nonstop_mode=1 {omits all stops} @d scroll_mode=2 {omits error stops} @d error_stop_mode=3 {stops at every opportunity to interact} @d print_err(#)==begin if interaction=error_stop_mode then wake_up_terminal; print_nl("! "); print(#); @.!\relax@> end @= @!interaction:batch_mode..error_stop_mode; {current level of interaction} @ @=interaction:=error_stop_mode; @ \MP\ is careful not to call |error| when the print |selector| setting might be unusual. The only possible values of |selector| at the time of error messages are \yskip\hang|no_print| (when |interaction=batch_mode| and |log_file| not yet open); \hang|term_only| (when |interaction>batch_mode| and |log_file| not yet open); \hang|log_only| (when |interaction=batch_mode| and |log_file| is open); \hang|term_and_log| (when |interaction>batch_mode| and |log_file| is open). @= if interaction=batch_mode then selector:=no_print@+else selector:=term_only @ A global variable |deletions_allowed| is set |false| if the |get_next| routine is active when |error| is called; this ensures that |get_next| will never be called recursively. @^recursion@> The global variable |history| records the worst level of error that has been detected. It has four possible values: |spotless|, |warning_issued|, |error_message_issued|, and |fatal_error_stop|. Another global variable, |error_count|, is increased by one when an |error| occurs without an interactive dialog, and it is reset to zero at the end of every statement. If |error_count| reaches 100, \MP\ decides that there is no point in continuing further. @d spotless=0 {|history| value when nothing has been amiss yet} @d warning_issued=1 {|history| value when |begin_diagnostic| has been called} @d error_message_issued=2 {|history| value when |error| has been called} @d fatal_error_stop=3 {|history| value when termination was premature} @= @!deletions_allowed:boolean; {is it safe for |error| to call |get_next|?} @!history:spotless..fatal_error_stop; {has the source input been clean so far?} @!error_count:-1..100; {the number of scrolled errors since the last statement ended} @ The value of |history| is initially |fatal_error_stop|, but it will be changed to |spotless| if \MP\ survives the initialization process. @= deletions_allowed:=true; error_count:=0; {|history| is initialized elsewhere} @ Since errors can be detected almost anywhere in \MP, we want to declare the error procedures near the beginning of the program. But the error procedures in turn use some other procedures, which need to be declared |forward| before we get to |error| itself. It is possible for |error| to be called recursively if some error arises when |get_next| is being used to delete a token, and/or if some fatal error occurs while \MP\ is trying to fix a non-fatal one. But such recursion @^recursion@> is never more than two levels deep. @= procedure@?normalize_selector; forward;@t\2@>@/ procedure@?get_next; forward;@t\2@>@/ procedure@?term_input; forward;@t\2@>@/ procedure@?show_context; forward;@t\2@>@/ procedure@?begin_file_reading; forward;@t\2@>@/ procedure@?open_log_file; forward;@t\2@>@/ procedure@?close_files_and_terminate; forward;@t\2@>@/ procedure@?clear_for_error_prompt; forward;@t\2@>@/ @t\4\hskip-\fontdimen2\font@>@;@+@!debug@+procedure@?debug_help; forward;@;@+gubed@;@/ @t\4@>@ @ Individual lines of help are recorded in the array |help_line|, which contains entries in positions |0..(help_ptr-1)|. They should be printed in reverse order, i.e., with |help_line[0]| appearing last. @d hlp1(#)==help_line[0]:=#;@+end @d hlp2(#)==help_line[1]:=#; hlp1 @d hlp3(#)==help_line[2]:=#; hlp2 @d hlp4(#)==help_line[3]:=#; hlp3 @d hlp5(#)==help_line[4]:=#; hlp4 @d hlp6(#)==help_line[5]:=#; hlp5 @d help0==help_ptr:=0 {sometimes there might be no help} @d help1==@+begin help_ptr:=1; hlp1 {use this with one help line} @d help2==@+begin help_ptr:=2; hlp2 {use this with two help lines} @d help3==@+begin help_ptr:=3; hlp3 {use this with three help lines} @d help4==@+begin help_ptr:=4; hlp4 {use this with four help lines} @d help5==@+begin help_ptr:=5; hlp5 {use this with five help lines} @d help6==@+begin help_ptr:=6; hlp6 {use this with six help lines} @= @!help_line:array[0..5] of str_number; {helps for the next |error|} @!help_ptr:0..6; {the number of help lines present} @!use_err_help:boolean; {should the |err_help| string be shown?} @!err_help:str_number; {a string set up by \&{errhelp}} @!filename_template:str_number; {a string set up by \&{filenametemplate}} @ @= help_ptr:=0; use_err_help:=false; err_help:=0; filename_template:=0; @ The |jump_out| procedure just cuts across all active procedure levels and goes to |end_of_MP|. This is the only nonlocal |@!goto| statement in the whole program. It is used when there is no recovery from a particular error. Some \PASCAL\ compilers do not implement non-local |goto| statements. @^system dependencies@> In such cases the body of |jump_out| should simply be `|close_files_and_terminate|;\thinspace' followed by a call on some system procedure that quietly terminates the program. @= procedure jump_out; begin goto end_of_MP; end; @ Here now is the general |error| routine. @= procedure error; {completes the job of error reporting} label continue,exit; var @!c:ASCII_code; {what the user types} @!s1,@!s2,@!s3:integer; {used to save global variables when deleting tokens} @!j:pool_pointer; {character position being printed} begin if history; incr(error_count); if error_count=100 then begin print_nl("(That makes 100 errors; please try again.)"); @.That makes 100 errors...@> history:=fatal_error_stop; jump_out; end; @; exit:end; @ @= loop@+begin continue: clear_for_error_prompt; prompt_input("? "); @.?\relax@> if last=first then return; c:=buffer[first]; if c>="a" then c:=c+"A"-"a"; {convert to uppercase} @; end @ It is desirable to provide an `\.E' option here that gives the user an easy way to return from \MP\ to the system editor, with the offending line ready to be edited. But such an extension requires some system wizardry, so the present implementation simply types out the name of the file that should be edited and the relevant line number. @^system dependencies@> There is a secret `\.D' option available when the debugging routines haven't been commented~out. @^debugging@> @= case c of "0","1","2","3","4","5","6","7","8","9": if deletions_allowed then @; @t\4\4@>@;@+@!debug "D":begin debug_help;goto continue;@+end;@+gubed@/ "E": if file_ptr>0 then begin print_nl("You want to edit file "); @.You want to edit file x@> print(input_stack[file_ptr].name_field); print(" at line "); print_int(true_line);@/ interaction:=scroll_mode; jump_out; end; "H": @; "I":@; "Q","R","S":@; "X":begin interaction:=scroll_mode; jump_out; end; othercases do_nothing endcases;@/ @ @ @= begin print("Type to proceed, S to scroll future error messages,");@/ @.Type to proceed...@> print_nl("R to run without stopping, Q to run quietly,");@/ print_nl("I to insert something, "); if file_ptr>0 then print("E to edit your file,"); if deletions_allowed then print_nl("1 or ... or 9 to ignore the next 1 to 9 tokens of input,"); print_nl("H for help, X to quit."); end @ Here the author of \MP\ apologizes for making use of the numerical relation between |"Q"|, |"R"|, |"S"|, and the desired interaction settings |batch_mode|, |nonstop_mode|, |scroll_mode|. @^Knuth, Donald Ervin@> @= begin error_count:=0; interaction:=batch_mode+c-"Q"; print("OK, entering "); case c of "Q":begin print("batchmode"); decr(selector); end; "R":print("nonstopmode"); "S":print("scrollmode"); end; {there are no other cases} print("..."); print_ln; update_terminal; return; end @ When the following code is executed, |buffer[(first+1)..(last-1)]| may contain the material inserted by the user; otherwise another prompt will be given. In order to understand this part of the program fully, you need to be familiar with \MP's input stacks. @= begin begin_file_reading; {enter a new syntactic level for terminal input} if last>first+1 then begin loc:=first+1; buffer[first]:=" "; end else begin prompt_input("insert>"); loc:=first; @.insert>@> end; first:=last+1; cur_input.limit_field:=last; return; end @ We allow deletion of up to 99 tokens at a time. @= begin s1:=cur_cmd; s2:=cur_mod; s3:=cur_sym; OK_to_interrupt:=false; if (last>first+1) and (buffer[first+1]>="0")and(buffer[first+1]<="9") then c:=c*10+buffer[first+1]-"0"*11 else c:=c-"0"; while c>0 do begin get_next; {one-level recursive call of |error| is possible} @; decr(c); end; cur_cmd:=s1; cur_mod:=s2; cur_sym:=s3; OK_to_interrupt:=true; help2("I have just deleted some text, as you asked.")@/ ("You can now delete more, or insert, or whatever."); show_context; goto continue; end @ @= begin if use_err_help then begin @; use_err_help:=false; end else begin if help_ptr=0 then help2("Sorry, I don't know how to help in this situation.")@/ @t\kern1em@>("Maybe you should try asking a human?"); repeat decr(help_ptr); print(help_line[help_ptr]); print_ln; until help_ptr=0; end; help4("Sorry, I already gave what help I could...")@/ ("Maybe you should try asking a human?")@/ ("An error might have occurred before I noticed any problems.")@/ ("``If all else fails, read the instructions.''");@/ goto continue; end @ @= j:=str_start[err_help]; while jsi("%") then print(so(str_pool[j])) else if j+1=str_stop(err_help) then print_ln else if str_pool[j+1]<>si("%") then print_ln else begin incr(j); print_char("%"); end; incr(j); end @ @= if interaction>batch_mode then decr(selector); {avoid terminal output} if use_err_help then begin print_nl(""); @; end else while help_ptr>0 do begin decr(help_ptr); print_nl(help_line[help_ptr]); end; print_ln; if interaction>batch_mode then incr(selector); {re-enable terminal output} print_ln @ In anomalous cases, the print selector might be in an unknown state; the following subroutine is called to fix things just enough to keep running a bit longer. @p procedure normalize_selector; begin if log_opened then selector:=term_and_log else selector:=term_only; if job_name=0 then open_log_file; if interaction=batch_mode then decr(selector); end; @ The following procedure prints \MP's last words before dying. @d succumb==begin if interaction=error_stop_mode then interaction:=scroll_mode; {no more interaction} if log_opened then error; @!debug if interaction>batch_mode then debug_help;@;@+gubed@;@/ history:=fatal_error_stop; jump_out; {irrecoverable error} end @= procedure fatal_error(@!s:str_number); {prints |s|, and that's it} begin normalize_selector;@/ print_err("Emergency stop"); help1(s); succumb; @.Emergency stop@> end; @ Here is the most dreaded error message. @= procedure overflow(@!s:str_number;@!n:integer); {stop due to finiteness} begin normalize_selector; print_err("MetaPost capacity exceeded, sorry ["); @.MetaPost capacity exceeded ...@> print(s); print_char("="); print_int(n); print_char("]"); help2("If you really absolutely need more capacity,")@/ ("you can ask a wizard to enlarge me."); succumb; end; @ The program might sometime run completely amok, at which point there is no choice but to stop. If no previous error has been detected, that's bad news; a message is printed that is really intended for the \MP\ maintenance person instead of the user (unless the user has been particularly diabolical). The index entries for `this can't happen' may help to pinpoint the problem. @^dry rot@> @= procedure confusion(@!s:str_number); {consistency check violated; |s| tells where} begin normalize_selector; if history help1("I'm broken. Please show this to someone who can fix can fix"); end else begin print_err("I can't go on meeting you like this"); @.I can't go on...@> help2("One of your faux pas seems to have wounded me deeply...")@/ ("in fact, I'm barely conscious. Please fix it and try again."); end; succumb; end; @ Users occasionally want to interrupt \MP\ while it's running. If the \PASCAL\ runtime system allows this, one can implement a routine that sets the global variable |interrupt| to some nonzero value when such an interrupt is signaled. Otherwise there is probably at least a way to make |interrupt| nonzero using the \PASCAL\ debugger. @^system dependencies@> @^debugging@> @d check_interrupt==begin if interrupt<>0 then pause_for_instructions; end @= @!interrupt:integer; {should \MP\ pause for instructions?} @!OK_to_interrupt:boolean; {should interrupts be observed?} @ @= interrupt:=0; OK_to_interrupt:=true; @ When an interrupt has been detected, the program goes into its highest interaction level and lets the user have the full flexibility of the |error| routine. \MP\ checks for interrupts only at times when it is safe to do this. @p procedure pause_for_instructions; begin if OK_to_interrupt then begin interaction:=error_stop_mode; if (selector=log_only)or(selector=no_print) then incr(selector); print_err("Interruption"); @.Interruption@> help3("You rang?")@/ ("Try to insert some instructions for me (e.g.,`I show x'),")@/ ("unless you just want to quit by typing `X'."); deletions_allowed:=false; error; deletions_allowed:=true; interrupt:=0; end; end; @ Many of \MP's error messages state that a missing token has been inserted behind the scenes. We can save string space and program space by putting this common code into a subroutine. @p procedure missing_err(@!s:str_number); begin print_err("Missing `"); print(s); print("' has been inserted"); @.Missing...inserted@> end; @* \[7] Arithmetic with scaled numbers. The principal computations performed by \MP\ are done entirely in terms of integers less than $2^{31}$ in magnitude; thus, the arithmetic specified in this program can be carried out in exactly the same way on a wide variety of computers, including some small ones. @^small computers@> But \PASCAL\ does not define the @!|div| operation in the case of negative dividends; for example, the result of |(-2*n-1) div 2| is |-(n+1)| on some computers and |-n| on others. There are two principal types of arithmetic: ``translation-preserving,'' in which the identity |(a+q*b)div b=(a div b)+q| is valid; and ``negation-preserving,'' in which |(-a)div b=-(a div b)|. This leads to two \MP s, which can produce different results, although the differences should be negligible when the language is being used properly. The \TeX\ processor has been defined carefully so that both varieties of arithmetic will produce identical output, but it would be too inefficient to constrain \MP\ in a similar way. @d el_gordo == @'17777777777 {$2^{31}-1$, the largest value that \MP\ likes} @ One of \MP's most common operations is the calculation of $\lfloor{a+b\over2}\rfloor$, the midpoint of two given integers |a| and~|b|. The only decent way to do this in \PASCAL\ is to write `|(a+b) div 2|'; but on most machines it is far more efficient to calculate `|(a+b)| right shifted one bit'. Therefore the midpoint operation will always be denoted by `|half(a+b)|' in this program. If \MP\ is being implemented with languages that permit binary shifting, the |half| macro should be changed to make this operation as efficient as possible. Since some languages have shift operators that can only be trusted to work on positive numbers, there is also a macro |halfp| that is used only when the quantity being halved is known to be positive or zero. @d half(#)==(#) div 2 @d halfp(#)==(#) div 2 @ A single computation might use several subroutine calls, and it is desirable to avoid producing multiple error messages in case of arithmetic overflow. So the routines below set the global variable |arith_error| to |true| instead of reporting errors directly to the user. @^overflow in arithmetic@> @= @!arith_error:boolean; {has arithmetic overflow occurred recently?} @ @= arith_error:=false; @ At crucial points the program will say |check_arith|, to test if an arithmetic error has been detected. @d check_arith==begin if arith_error then clear_arith;@+end @p procedure clear_arith; begin print_err("Arithmetic overflow"); @.Arithmetic overflow@> help4("Uh, oh. A little while ago one of the quantities that I was")@/ ("computing got too large, so I'm afraid your answers will be")@/ ("somewhat askew. You'll probably have to adopt different")@/ ("tactics next time. But I shall try to carry on anyway."); error; arith_error:=false; end; @ Addition is not always checked to make sure that it doesn't overflow, but in places where overflow isn't too unlikely the |slow_add| routine is used. @p function slow_add(@!x,@!y:integer):integer; begin if x>=0 then if y<=el_gordo-x then slow_add:=x+y else begin arith_error:=true; slow_add:=el_gordo; end else if -y<=el_gordo+x then slow_add:=x+y else begin arith_error:=true; slow_add:=-el_gordo; end; end; @ Fixed-point arithmetic is done on {\sl scaled integers\/} that are multiples of $2^{-16}$. In other words, a binary point is assumed to be sixteen bit positions from the right end of a binary computer word. @d quarter_unit == @'40000 {$2^{14}$, represents 0.250000} @d half_unit == @'100000 {$2^{15}$, represents 0.50000} @d three_quarter_unit == @'140000 {$3\cdot2^{14}$, represents 0.75000} @d unity == @'200000 {$2^{16}$, represents 1.00000} @d two == @'400000 {$2^{17}$, represents 2.00000} @d three == @'600000 {$2^{17}+2^{16}$, represents 3.00000} @= @!scaled = integer; {this type is used for scaled integers} @!small_number=0..63; {this type is self-explanatory} @ The following function is used to create a scaled integer from a given decimal fraction $(.d_0d_1\ldots d_{k-1})$, where |0<=k<=17|. The digit $d_i$ is given in |dig[i]|, and the calculation produces a correctly rounded result. @p function round_decimals(@!k:small_number) : scaled; {converts a decimal fraction} var @!a:integer; {the accumulator} begin a:=0; while k>0 do begin decr(k); a:=(a+dig[k]*two) div 10; end; round_decimals:=halfp(a+1); end; @ Conversely, here is a procedure analogous to |print_int|. If the output of this procedure is subsequently read by \MP\ and converted by the |round_decimals| routine above, it turns out that the original value will be reproduced exactly. A decimal point is printed only if the value is not an integer. If there is more than one way to print the result with the optimum number of digits following the decimal point, the closest possible value is given. The invariant relation in the \&{repeat} loop is that a sequence of decimal digits yet to be printed will yield the original number if and only if they form a fraction~$f$ in the range $s-\delta\L10\cdot2^{16}f= procedure print_scaled(@!s:scaled); {prints scaled real, rounded to five digits} var @!delta:scaled; {amount of allowable inaccuracy} begin if s<0 then begin print_char("-"); negate(s); {print the sign, if negative} end; print_int(s div unity); {print the integer part} s:=10*(s mod unity)+5; if s<>5 then begin delta:=10; print_char("."); repeat if delta>unity then s:=s+@'100000-(delta div 2); {round the final digit} print_char("0"+(s div unity)); s:=10*(s mod unity); delta:=delta*10; until s<=delta; end; end; @ We often want to print two scaled quantities in parentheses, separated by a comma. @= procedure print_two(@!x,@!y:scaled); {prints `|(x,y)|'} begin print_char("("); print_scaled(x); print_char(","); print_scaled(y); print_char(")"); end; @ The |scaled| quantities in \MP\ programs are generally supposed to be less than $2^{12}$ in absolute value, so \MP\ does much of its internal arithmetic with 28~significant bits of precision. A |fraction| denotes a scaled integer whose binary point is assumed to be 28 bit positions from the right. @d fraction_half==@'1000000000 {$2^{27}$, represents 0.50000000} @d fraction_one==@'2000000000 {$2^{28}$, represents 1.00000000} @d fraction_two==@'4000000000 {$2^{29}$, represents 2.00000000} @d fraction_three==@'6000000000 {$3\cdot2^{28}$, represents 3.00000000} @d fraction_four==@'10000000000 {$2^{30}$, represents 4.00000000} @= @!fraction=integer; {this type is used for scaled fractions} @ In fact, the two sorts of scaling discussed above aren't quite sufficient; \MP\ has yet another, used internally to keep track of angles in units of $2^{-20}$ degrees. @d forty_five_deg==@'264000000 {$45\cdot2^{20}$, represents $45^\circ$} @d ninety_deg==@'550000000 {$90\cdot2^{20}$, represents $90^\circ$} @d one_eighty_deg==@'1320000000 {$180\cdot2^{20}$, represents $180^\circ$} @d three_sixty_deg==@'2640000000 {$360\cdot2^{20}$, represents $360^\circ$} @= @!angle=integer; {this type is used for scaled angles} @ The |make_fraction| routine produces the |fraction| equivalent of |p/q|, given integers |p| and~|q|; it computes the integer $f=\lfloor2^{28}p/q+{1\over2}\rfloor$, when $p$ and $q$ are positive. If |p| and |q| are both of the same scaled type |t|, the ``type relation'' |make_fraction(t,t)=fraction| is valid; and it's also possible to use the subroutine ``backwards,'' using the relation |make_fraction(t,fraction)=t| between scaled types. If the result would have magnitude $2^{31}$ or more, |make_fraction| sets |arith_error:=true|. Most of \MP's internal computations have been designed to avoid this sort of error. If this subroutine were programmed in assembly language on a typical machine, we could simply compute |(@t$2^{28}$@>*p)div q|, since a double-precision product can often be input to a fixed-point division instruction. But when we are restricted to \PASCAL\ arithmetic it is necessary either to resort to multiple-precision maneuvering or to use a simple but slow iteration. The multiple-precision technique would be about three times faster than the code adopted here, but it would be comparatively long and tricky, involving about sixteen additional multiplications and divisions. This operation is part of \MP's ``inner loop''; indeed, it will consume nearly 10\pct! of the running time (exclusive of input and output) if the code below is left unchanged. A machine-dependent recoding will therefore make \MP\ run faster. The present implementation is highly portable, but slow; it avoids multiplication and division except in the initial stage. System wizards should be careful to replace it with a routine that is guaranteed to produce identical results in all cases. @^system dependencies@> As noted below, a few more routines should also be replaced by machine-dependent code, for efficiency. But when a procedure is not part of the ``inner loop,'' such changes aren't advisable; simplicity and robustness are preferable to trickery, unless the cost is too high. @^inner loop@> @p function make_fraction(@!p,@!q:integer):fraction; var @!f:integer; {the fraction bits, with a leading 1 bit} @!n:integer; {the integer part of $\vert p/q\vert$} @!negative:boolean; {should the result be negated?} @!be_careful:integer; {disables certain compiler optimizations} begin if p>=0 then negative:=false else begin negate(p); negative:=true; end; if q<=0 then begin debug if q=0 then confusion("/");@;@+gubed@;@/ @:this can't happen /}{\quad \./@> negate(q); negative:=not negative; end; n:=p div q; p:=p mod q; if n>=8 then begin arith_error:=true; if negative then make_fraction:=-el_gordo@+else make_fraction:=el_gordo; end else begin n:=(n-1)*fraction_one; @; if negative then make_fraction:=-(f+n)@+else make_fraction:=f+n; end; end; @ The |repeat| loop here preserves the following invariant relations between |f|, |p|, and~|q|: (i)~|0<=p @= f:=1; repeat be_careful:=p-q; p:=be_careful+p; if p>=0 then f:=f+f+1 else begin double(f); p:=p+q; end; until f>=fraction_one; be_careful:=p-q; if be_careful+p>=0 then incr(f) @ The dual of |make_fraction| is |take_fraction|, which multiplies a given integer~|q| by a fraction~|f|. When the operands are positive, it computes $p=\lfloor qf/2^{28}+{1\over2}\rfloor$, a symmetric function of |q| and~|f|. This routine is even more ``inner loopy'' than |make_fraction|; the present implementation consumes almost 20\pct! of \MP's computation time during typical jobs, so a machine-language substitute is advisable. @^inner loop@> @^system dependencies@> @p function take_fraction(@!q:integer;@!f:fraction):integer; var @!p:integer; {the fraction so far} @!negative:boolean; {should the result be negated?} @!n:integer; {additional multiple of $q$} @!be_careful:integer; {disables certain compiler optimizations} begin @=0| and |q>=0|@>; if f; be_careful:=n-el_gordo; if be_careful+p>0 then begin arith_error:=true; n:=el_gordo-p; end; if negative then take_fraction:=-(n+p) else take_fraction:=n+p; end; @ @=0| and |q>=0|@>= if f>=0 then negative:=false else begin negate(f); negative:=true; end; if q<0 then begin negate(q); negative:=not negative; end; @ The invariant relations in this case are (i)~$\lfloor(qf+p)/2^k\rfloor =\lfloor qf_0/2^{28}+{1\over2}\rfloor$, where $k$ is an integer and $f_0$ is the original value of~$f$; (ii)~$2^k\L f<2^{k+1}$. @^inner loop@> @= p:=fraction_half; {that's $2^{27}$; the invariants hold now with $k=28$} if q @p function take_scaled(@!q:integer;@!f:scaled):integer; var @!p:integer; {the fraction so far} @!negative:boolean; {should the result be negated?} @!n:integer; {additional multiple of $q$} @!be_careful:integer; {disables certain compiler optimizations} begin @=0| and |q>=0|@>; if f; be_careful:=n-el_gordo; if be_careful+p>0 then begin arith_error:=true; n:=el_gordo-p; end; if negative then take_scaled:=-(n+p) else take_scaled:=n+p; end; @ @= p:=half_unit; {that's $2^{15}$; the invariants hold now with $k=16$} @^inner loop@> if q=0 then negative:=false else begin negate(p); negative:=true; end; if q<=0 then begin debug if q=0 then confusion("/");@+gubed@;@/ @:this can't happen /}{\quad \./@> negate(q); negative:=not negative; end; n:=p div q; p:=p mod q; if n>=@'100000 then begin arith_error:=true; if negative then make_scaled:=-el_gordo@+else make_scaled:=el_gordo; end else begin n:=(n-1)*unity; @; if negative then make_scaled:=-(f+n)@+else make_scaled:=f+n; end; end; @ @= f:=1; repeat be_careful:=p-q; p:=be_careful+p; if p>=0 then f:=f+f+1 else begin double(f); p:=p+q; end; until f>=unity; be_careful:=p-q; if be_careful+p>=0 then incr(f) @ Here is a typical example of how the routines above can be used. It computes the function $${1\over3\tau}f(\theta,\phi)= {\tau^{-1}\bigl(2+\sqrt2\,(\sin\theta-{1\over16}\sin\phi) (\sin\phi-{1\over16}\sin\theta)(\cos\theta-\cos\phi)\bigr)\over 3\,\bigl(1+{1\over2}(\sqrt5-1)\cos\theta+{1\over2}(3-\sqrt5\,)\cos\phi\bigr)},$$ where $\tau$ is a |scaled| ``tension'' parameter. This is \MP's magic fudge factor for placing the first control point of a curve that starts at an angle $\theta$ and ends at an angle $\phi$ from the straight path. (Actually, if the stated quantity exceeds 4, \MP\ reduces it to~4.) The trigonometric quantity to be multiplied by $\sqrt2$ is less than $\sqrt2$. (It's a sum of eight terms whose absolute values can be bounded using relations such as $\sin\theta\cos\theta\L{1\over2}$.) Thus the numerator is positive; and since the tension $\tau$ is constrained to be at least $3\over4$, the numerator is less than $16\over3$. The denominator is nonnegative and at most~6. Hence the fixed-point calculations below are guaranteed to stay within the bounds of a 32-bit computer word. The angles $\theta$ and $\phi$ are given implicitly in terms of |fraction| arguments |st|, |ct|, |sf|, and |cf|, representing $\sin\theta$, $\cos\theta$, $\sin\phi$, and $\cos\phi$, respectively. @p function velocity(@!st,@!ct,@!sf,@!cf:fraction;@!t:scaled):fraction; var @!acc,@!num,@!denom:integer; {registers for intermediate calculations} begin acc:=take_fraction(st-(sf div 16), sf-(st div 16)); acc:=take_fraction(acc,ct-cf); num:=fraction_two+take_fraction(acc,379625062); {$2^{28}\sqrt2\approx379625062.497$} denom:=fraction_three+take_fraction(ct,497706707)+take_fraction(cf,307599661); {$3\cdot2^{27}\cdot(\sqrt5-1)\approx497706706.78$ and $3\cdot2^{27}\cdot(3-\sqrt5\,)\approx307599661.22$} if t<>unity then num:=make_scaled(num,t); {|make_scaled(fraction,scaled)=fraction|} if num div 4>=denom then velocity:=fraction_four else velocity:=make_fraction(num,denom); end; @ The following somewhat different subroutine tests rigorously if $ab$ is greater than, equal to, or less than~$cd$, given integers $(a,b,c,d)$. In most cases a quick decision is reached. The result is $+1$, 0, or~$-1$ in the three respective cases. @d return_sign(#)==begin ab_vs_cd:=#; return; end @p function ab_vs_cd(@!a,b,c,d:integer):integer; label exit; var @!q,@!r:integer; {temporary registers} begin @=0|, |b,d>0|@>; loop@+ begin q := a div d; r := c div b; if q<>r then if q>r then return_sign(1)@+else return_sign(-1); q := a mod d; r := c mod b; if r=0 then if q=0 then return_sign(0)@+else return_sign(1); if q=0 then return_sign(-1); a:=b; b:=q; c:=d; d:=r; end; {now |a>d>0| and |c>b>0|} exit:end; @ @= if a<0 then begin negate(a); negate(b); end; if c<0 then begin negate(c); negate(d); end; if d<=0 then begin if b>=0 then if ((a=0)or(b=0))and((c=0)or(d=0)) then return_sign(0) else return_sign(1); if d=0 then if a=0 then return_sign(0)@+else return_sign(-1); q:=a; a:=c; c:=q; q:=-b; b:=-d; d:=q; end else if b<=0 then begin if b<0 then if a>0 then return_sign(-1); if c=0 then return_sign(0) else return_sign(-1); end @ We conclude this set of elementary routines with some simple rounding and truncation operations that are coded in a machine-independent fashion. The routines are slightly complicated because we want them to work without overflow whenever $-2^{31}\L x<2^{31}$. @p function floor_scaled(@!x:scaled):scaled; {$2^{16}\lfloor x/2^{16}\rfloor$} var @!be_careful:integer; {temporary register} begin if x>=0 then floor_scaled:=x-(x mod unity) else begin be_careful:=x+1; floor_scaled:=x+((-be_careful) mod unity)+1-unity; end; end; @# function round_unscaled(@!x:scaled):integer; {$\lfloor x/2^{16}+.5\rfloor$} var @!be_careful:integer; {temporary register} begin if x>=half_unit then round_unscaled:=1+((x-half_unit) div unity) else if x>=-half_unit then round_unscaled:=0 else begin be_careful:=x+1; round_unscaled:=-(1+((-be_careful-half_unit) div unity)); end; end; @# function round_fraction(@!x:fraction):scaled; {$\lfloor x/2^{12}+.5\rfloor$} var @!be_careful:integer; {temporary register} begin if x>=2048 then round_fraction:=1+((x-2048) div 4096) else if x>=-2048 then round_fraction:=0 else begin be_careful:=x+1; round_fraction:=-(1+((-be_careful-2048) div 4096)); end; end; @* \[8] Algebraic and transcendental functions. \MP\ computes all of the necessary special functions from scratch, without relying on |real| arithmetic or system subroutines for sines, cosines, etc. @ To get the square root of a |scaled| number |x|, we want to calculate $s=\lfloor 2^8\!\sqrt x +{1\over2}\rfloor$. If $x>0$, this is the unique integer such that $2^{16}x-s\L s^2<2^{16}x+s$. The following subroutine determines $s$ by an iterative method that maintains the invariant relations $x=2^{46-2k}x_0\bmod 2^{30}$, $0 else begin k:=23; q:=2; while x|\unskip} begin decr(k); x:=x+x+x+x; end; if x; until k=0; square_rt:=halfp(q); end; end; @ @= begin if x<0 then begin print_err("Square root of "); @.Square root...replaced by 0@> print_scaled(x); print(" has been replaced by 0"); help2("Since I don't take square roots of negative numbers,")@/ ("I'm zeroing this one. Proceed, with fingers crossed."); error; end; square_rt:=0; end @ @= double(x); double(y); if x>=fraction_four then {note that |fraction_four=@t$2^{30}$@>|} begin x:=x-fraction_four; incr(y); end; double(x); y:=y+y-q; double(q); if x>=fraction_four then begin x:=x-fraction_four; incr(y); end; if y>q then begin y:=y-q; q:=q+2; end else if y<=0 then begin q:=q-2; y:=y+q; end; decr(k) @ Pythagorean addition $\psqrt{a^2+b^2}$ is implemented by an elegant iterative scheme due to Cleve Moler and Donald Morrison [{\sl IBM Journal @^Moler, Cleve Barry@> @^Morrison, Donald Ross@> of Research and Development\/ \bf27} (1983), 577--581]. It modifies |a| and~|b| in such a way that their Pythagorean sum remains invariant, while the smaller argument decreases. @p function pyth_add(@!a,@!b:integer):integer; label done; var @!r:fraction; {register used to transform |a| and |b|} @!big:boolean; {is the result dangerously near $2^{31}$?} begin a:=abs(a); b:=abs(b); if a0 then begin if a; if big then if a= loop@+ begin r:=make_fraction(b,a); r:=take_fraction(r,r); {now $r\approx b^2/a^2$} if r=0 then goto done; r:=make_fraction(r,fraction_four+r); a:=a+take_fraction(a+a,r); b:=take_fraction(b,r); end; done: @ Here is a similar algorithm for $\psqrt{a^2-b^2}$. It converges slowly when $b$ is near $a$, but otherwise it works fine. @p function pyth_sub(@!a,@!b:integer):integer; label done; var @!r:fraction; {register used to transform |a| and |b|} @!big:boolean; {is the input dangerously near $2^{31}$?} begin a:=abs(a); b:=abs(b); if a<=b then @ else begin if a; if big then a:=a+a; end; pyth_sub:=a; end; @ @= loop@+ begin r:=make_fraction(b,a); r:=take_fraction(r,r); {now $r\approx b^2/a^2$} if r=0 then goto done; r:=make_fraction(r,fraction_four-r); a:=a-take_fraction(a+a,r); b:=take_fraction(b,r); end; done: @ @= begin if a help2("Since I don't take square roots of negative numbers,")@/ ("I'm zeroing this one. Proceed, with fingers crossed."); error; end; a:=0; end @ The subroutines for logarithm and exponential involve two tables. The first is simple: |two_to_the[k]| equals $2^k$. The second involves a bit more calculation, which the author claims to have done correctly: |spec_log[k]| is $2^{27}$ times $\ln\bigl(1/(1-2^{-k})\bigr)= 2^{-k}+{1\over2}2^{-2k}+{1\over3}2^{-3k}+\cdots\,$, rounded to the nearest integer. @= @!two_to_the:array[0..30] of integer; {powers of two} @!spec_log:array[1..28] of integer; {special logarithms} @ @= @!k:integer; {all-purpose loop index} @ @= two_to_the[0]:=1; for k:=1 to 30 do two_to_the[k]:=2*two_to_the[k-1]; spec_log[1]:=93032640; spec_log[2]:=38612034; spec_log[3]:=17922280; spec_log[4]:=8662214; spec_log[5]:=4261238; spec_log[6]:=2113709; spec_log[7]:=1052693; spec_log[8]:=525315; spec_log[9]:=262400; spec_log[10]:=131136; spec_log[11]:=65552; spec_log[12]:=32772; spec_log[13]:=16385; for k:=14 to 27 do spec_log[k]:=two_to_the[27-k]; spec_log[28]:=1; @ Here is the routine that calculates $2^8$ times the natural logarithm of a |scaled| quantity; it is an integer approximation to $2^{24}\ln(x/2^{16})$, when |x| is a given positive integer. The method is based on exercise 1.2.2--25 in {\sl The Art of Computer Programming\/}: During the main iteration we have $1\L 2^{-30}x<1/(1-2^{1-k})$, and the logarithm of $2^{30}x$ remains to be added to an accumulator register called~$y$. Three auxiliary bits of accuracy are retained in~$y$ during the calculation, and sixteen auxiliary bits to extend |y| are kept in~|z| during the initial argument reduction. (We add $100\cdot2^{16}=6553600$ to~|z| and subtract 100 from~|y| so that |z| will not become negative; also, the actual amount subtracted from~|y| is~96, not~100, because we want to add~4 for rounding before the final division by~8.) @p function m_log(@!x:scaled):scaled; var @!y,@!z:integer; {auxiliary registers} @!k:integer; {iteration counter} begin if x<=0 then @ else begin y:=1302456956+4-100; {$14\times2^{27}\ln2\approx1302456956.421063$} z:=27595+6553600; {and $2^{16}\times .421063\approx 27595$} while xfraction_four+4 do @; m_log:=y div 8; end; end; @ @= begin z:=((x-1) div two_to_the[k])+1; {$z=\lceil x/2^k\rceil$} while x= begin print_err("Logarithm of "); @.Logarithm...replaced by 0@> print_scaled(x); print(" has been replaced by 0"); help2("Since I don't take logs of non-positive numbers,")@/ ("I'm zeroing this one. Proceed, with fingers crossed."); error; m_log:=0; end @ Conversely, the exponential routine calculates $\exp(x/2^8)$, when |x| is |scaled|. The result is an integer approximation to $2^{16}\exp(x/2^{24})$, when |x| is regarded as an integer. @p function m_exp(@!x:scaled):scaled; var @!k:small_number; {loop control index} @!y,@!z:integer; {auxiliary registers} begin if x>174436200 then {$2^{24}\ln((2^{31}-1)/2^{16})\approx 174436199.51$} begin arith_error:=true; m_exp:=el_gordo; end else if x<-197694359 then m_exp:=0 {$2^{24}\ln(2^{-1}/2^{16})\approx-197694359.45$} else begin if x<=0 then begin z:=-8*x; y:=@'4000000; {$y=2^{20}$} end else begin if x<=127919879 then z:=1023359037-8*x {$2^{27}\ln((2^{31}-1)/2^{20})\approx 1023359037.125$} else z:=8*(174436200-x); {|z| is always nonnegative} y:=el_gordo; end; @; if x<=127919879 then m_exp:=(y+8) div 16@+else m_exp:=y; end; end; @ The idea here is that subtracting |spec_log[k]| from |z| corresponds to multiplying |y| by $1-2^{-k}$. A subtle point (which had to be checked) was that if $x=127919879$, the value of~|y| will decrease so that |y+8| doesn't overflow. In fact, $z$ will be 5 in this case, and |y| will decrease by~64 when |k=25| and by~16 when |k=27|. @= k:=1; while z>0 do begin while z>=spec_log[k] do begin z:=z-spec_log[k]; y:=y-1-((y-two_to_the[k-1]) div two_to_the[k]); end; incr(k); end @ The trigonometric subroutines use an auxiliary table such that |spec_atan[k]| contains an approximation to the |angle| whose tangent is~$1/2^k$. @= @!spec_atan:array[1..26] of angle; {$\arctan2^{-k}$ times $2^{20}\cdot180/\pi$} @ @= spec_atan[1]:=27855475; spec_atan[2]:=14718068; spec_atan[3]:=7471121; spec_atan[4]:=3750058; spec_atan[5]:=1876857; spec_atan[6]:=938658; spec_atan[7]:=469357; spec_atan[8]:=234682; spec_atan[9]:=117342; spec_atan[10]:=58671; spec_atan[11]:=29335; spec_atan[12]:=14668; spec_atan[13]:=7334; spec_atan[14]:=3667; spec_atan[15]:=1833; spec_atan[16]:=917; spec_atan[17]:=458; spec_atan[18]:=229; spec_atan[19]:=115; spec_atan[20]:=57; spec_atan[21]:=29; spec_atan[22]:=14; spec_atan[23]:=7; spec_atan[24]:=4; spec_atan[25]:=2; spec_atan[26]:=1; @ Given integers |x| and |y|, not both zero, the |n_arg| function returns the |angle| whose tangent points in the direction $(x,y)$. This subroutine first determines the correct octant, then solves the problem for |0<=y<=x|, then converts the result appropriately to return an answer in the range |-one_eighty_deg<=@t$\theta$@><=one_eighty_deg|. (The answer is |+one_eighty_deg| if |y=0| and |x<0|, but an answer of |-one_eighty_deg| is possible if, for example, |y=-1| and $x=-2^{30}$.) The octants are represented in a ``Gray code,'' since that turns out to be computationally simplest. @d negate_x=1 @d negate_y=2 @d switch_x_and_y=4 @d first_octant=1 @d second_octant=first_octant+switch_x_and_y @d third_octant=first_octant+switch_x_and_y+negate_x @d fourth_octant=first_octant+negate_x @d fifth_octant=first_octant+negate_x+negate_y @d sixth_octant=first_octant+switch_x_and_y+negate_x+negate_y @d seventh_octant=first_octant+switch_x_and_y+negate_y @d eighth_octant=first_octant+negate_y @p function n_arg(@!x,@!y:integer):angle; var @!z:angle; {auxiliary register} @!t:integer; {temporary storage} @!k:small_number; {loop counter} @!octant:first_octant..sixth_octant; {octant code} begin if x>=0 then octant:=first_octant else begin negate(x); octant:=first_octant+negate_x; end; if y<0 then begin negate(y); octant:=octant+negate_y; end; if x else begin @; @; end; end; @ @= begin print_err("angle(0,0) is taken as zero"); @.angle(0,0)...zero@> help2("The `angle' between two identical points is undefined.")@/ ("I'm zeroing this one. Proceed, with fingers crossed."); error; n_arg:=0; end @ @= case octant of first_octant:n_arg:=z; second_octant:n_arg:=ninety_deg-z; third_octant:n_arg:=ninety_deg+z; fourth_octant:n_arg:=one_eighty_deg-z; fifth_octant:n_arg:=z-one_eighty_deg; sixth_octant:n_arg:=-z-ninety_deg; seventh_octant:n_arg:=z-ninety_deg; othercases n_arg:=-z; { |eighth_octant| } end @ At this point we have |x>=y>=0|, and |x>0|. The numbers are scaled up or down until $2^{28}\L x<2^{29}$, so that accurate fixed-point calculations will be made. @= while x>=fraction_two do begin x:=halfp(x); y:=halfp(y); end; z:=0; if y>0 then begin while x; end @ During the calculations of this section, variables |x| and~|y| represent actual coordinates $(x,2^{-k}y)$. We will maintain the condition |x>=y|, so that the tangent will be at most $2^{-k}$. If $x<2y$, the tangent is greater than $2^{-k-1}$. The transformation $(a,b)\mapsto(a+b\tan\phi,b-a\tan\phi)$ replaces $(a,b)$ by coordinates whose angle has decreased by~$\phi$; in the special case $a=x$, $b=2^{-k}y$, and $\tan\phi=2^{-k-1}$, this operation reduces to the particularly simple iteration shown here. [Cf.~John E. Meggitt, @^Meggitt, John E.@> {\sl IBM Journal of Research and Development\/ \bf6} (1962), 210--226.] The initial value of |x| will be multiplied by at most $(1+{1\over2})(1+{1\over8})(1+{1\over32})\cdots\approx 1.7584$; hence there is no chance of integer overflow. @= k:=0; repeat double(y); incr(k); if y>x then begin z:=z+spec_atan[k]; t:=x; x:=x+(y div two_to_the[k+k]); y:=y-t; end; until k=15; repeat double(y); incr(k); if y>x then begin z:=z+spec_atan[k]; y:=y-x; end; until k=26 @ Conversely, the |n_sin_cos| routine takes an |angle| and produces the sine and cosine of that angle. The results of this routine are stored in global integer variables |n_sin| and |n_cos|. @= @!n_sin,@!n_cos:fraction; {results computed by |n_sin_cos|} @ Given an integer |z| that is $2^{20}$ times an angle $\theta$ in degrees, the purpose of |n_sin_cos(z)| is to set |x=@t$r\cos\theta$@>| and |y=@t$r\sin\theta$@>| (approximately), for some rather large number~|r|. The maximum of |x| and |y| will be between $2^{28}$ and $2^{30}$, so that there will be hardly any loss of accuracy. Then |x| and~|y| are divided by~|r|. @p procedure n_sin_cos(@!z:angle); {computes a multiple of the sine and cosine} var @!k:small_number; {loop control variable} @!q:0..7; {specifies the quadrant} @!r:fraction; {magnitude of |(x,y)|} @!x,@!y,@!t:integer; {temporary registers} begin while z<0 do z:=z+three_sixty_deg; z:=z mod three_sixty_deg; {now |0<=z; @; r:=pyth_add(x,y); n_cos:=make_fraction(x,r); n_sin:=make_fraction(y,r); end; @ In this case the octants are numbered sequentially. @= case q of 0:do_nothing; 1:begin t:=x; x:=y; y:=t; end; 2:begin t:=x; x:=-y; y:=t; end; 3:negate(x); 4:begin negate(x); negate(y); end; 5:begin t:=x; x:=-y; y:=-t; end; 6:begin t:=x; x:=y; y:=-t; end; 7:negate(y); end {there are no other cases} @ The main iteration of |n_sin_cos| is similar to that of |n_arg| but applied in reverse. The values of |spec_atan[k]| decrease slowly enough that this loop is guaranteed to terminate before the (nonexistent) value |spec_atan[27]| would be required. @= k:=1; while z>0 do begin if z>=spec_atan[k] then begin z:=z-spec_atan[k]; t:=x;@/ x:=t+y div two_to_the[k]; y:=y-t div two_to_the[k]; end; incr(k); end; if y<0 then y:=0 {this precaution may never be needed} @ And now let's complete our collection of numeric utility routines by considering random number generation. \MP\ generates pseudo-random numbers with the additive scheme recommended in Section 3.6 of {\sl The Art of Computer Programming}; however, the results are random fractions between 0 and |fraction_one-1|, inclusive. There's an auxiliary array |randoms| that contains 55 pseudo-random fractions. Using the recurrence $x_n=(x_{n-55}-x_{n-31})\bmod 2^{28}$, we generate batches of 55 new $x_n$'s at a time by calling |new_randoms|. The global variable |j_random| tells which element has most recently been consumed. The global variable |sys_random_seed| was introduced in version 0.9, for the sole reason of stressing the fact that the initial value of the random seed is system-dependant. The pascal code below will initialize this variable to |(internal[time] div unity)+internal[day]|, but this is not good enough on modern fast machines that are capable of running multiple MetaPost processes within the same second. @^system dependencies@> @= @!randoms:array[0..54] of fraction; {the last 55 random values generated} @!j_random:0..54; {the number of unused |randoms|} @!sys_random_seed:scaled; {the default random seed} @ To consume a random fraction, the program below will say `|next_random|' and then it will fetch |randoms[j_random]|. @d next_random==if j_random=0 then new_randoms else decr(j_random) @p procedure new_randoms; var @!k:0..54; {index into |randoms|} @!x:fraction; {accumulator} begin for k:=0 to 23 do begin x:=randoms[k]-randoms[k+31]; if x<0 then x:=x+fraction_one; randoms[k]:=x; end; for k:=24 to 54 do begin x:=randoms[k]-randoms[k-24]; if x<0 then x:=x+fraction_one; randoms[k]:=x; end; j_random:=54; end; @ To initialize the |randoms| table, we call the following routine. @p procedure init_randoms(@!seed:scaled); var @!j,@!jj,@!k:fraction; {more or less random integers} @!i:0..54; {index into |randoms|} begin j:=abs(seed); while j>=fraction_one do j:=halfp(j); k:=1; for i:=0 to 54 do begin jj:=k; k:=j-k; j:=jj; if k<0 then k:=k+fraction_one; randoms[(i*21)mod 55]:=j; end; new_randoms; new_randoms; new_randoms; {``warm up'' the array} end; @ To produce a uniform random number in the range |0<=u=u>x| or |0=u=x|, given a |scaled| value~|x|, we proceed as shown here. Note that the call of |take_fraction| will produce the values 0 and~|x| with about half the probability that it will produce any other particular values between 0 and~|x|, because it rounds its answers. @p function unif_rand(@!x:scaled):scaled; var @!y:scaled; {trial value} begin next_random; y:=take_fraction(abs(x),randoms[j_random]); if y=abs(x) then unif_rand:=0 else if x>0 then unif_rand:=y else unif_rand:=-y; end; @ Finally, a normal deviate with mean zero and unit standard deviation can readily be obtained with the ratio method (Algorithm 3.4.1R in {\sl The Art of Computer Programming\/}). @p function norm_rand:scaled; var @!x,@!u,@!l:integer; {what the book would call $2^{16}X$, $2^{28}U$, and $-2^{24}\ln U$} begin repeat repeat next_random; x:=take_fraction(112429,randoms[j_random]-fraction_half); {$2^{16}\sqrt{8/e}\approx 112428.82793$} next_random; u:=randoms[j_random]; until abs(x)=0; norm_rand:=x; end; @* \[9] Packed data. In order to make efficient use of storage space, \MP\ bases its major data structures on a |memory_word|, which contains either a (signed) integer, possibly scaled, or a small number of fields that are one half or one quarter of the size used for storing integers. If |x| is a variable of type |memory_word|, it contains up to four fields that can be referred to as follows: $$\vbox{\halign{\hfil#&#\hfil&#\hfil\cr |x|&.|int|&(an |integer|)\cr |x|&.|sc|\qquad&(a |scaled| integer)\cr |x.hh.lh|, |x.hh|&.|rh|&(two halfword fields)\cr |x.hh.b0|, |x.hh.b1|, |x.hh|&.|rh|&(two quarterword fields, one halfword field)\cr |x.qqqq.b0|, |x.qqqq.b1|, |x.qqqq|&.|b2|, |x.qqqq.b3|\hskip-100pt &\qquad\qquad\qquad(four quarterword fields)\cr}}$$ This is somewhat cumbersome to write, and not very readable either, but macros will be used to make the notation shorter and more transparent. The \PASCAL\ code below gives a formal definition of |memory_word| and its subsidiary types, using packed variant records. \MP\ makes no assumptions about the relative positions of the fields within a word. Since we are assuming 32-bit integers, a halfword must contain at least 16 bits, and a quarterword must contain at least 8 bits. @^system dependencies@> But it doesn't hurt to have more bits; for example,