[texhax] Any crazy math formulas for testing a TeX language interpreter

Douglas McKenna doug at mathemaesthetics.com
Tue Jan 12 02:27:51 CET 2016


-------------------------
David Carlisle wrote -

> is there any reason not to use latex (in particular does latex.ltx not load?)

LaTeX's code can be read and compiled, but JSBox doesn't yet have the ability to accurately find the external files in the/a distribution tree that LaTeX depends on when executed, and LaTeX tries to load quite a few.  The problem one has as a tester is that the moment a file is not found, all bets are off on whether things are working "correctly"/"compatibly" later on.  I'm slowly wrapping my head around solving this problem (yes, I know about kpathsea, I just don't understand it very well).   If it's not too many files, I can place copies of them into the job's working directory, and get things read that way, at least for one particular test job.

> I ask as we've done a lot of work over the last couple of years to make the core latex regression test suite (that basically runs a set of tests and compares normalised log files against stored reference versions)  work with multiple engines (currently it passes with (pdf)tex, luatex and xetex (and has known behaviour with (u)ptex)
> so it would be interesting to see what happens if you ran the test suite with jsbox.

JSBox log files won't be usefully diff-able with TeX log files, except in a few areas.  But that's okay with me, even though it's initially a pain.  JSBox tracing and error messages are not going to use the same format as TeX's.

With that said, the moment it appears that JSBox can run LaTeX (functionally if not literally), it will be very appropriate to pit any official test suite against it.  Even though (so far, as a private project) JSBox doesn't have to be, I really want and need it to be compatible.

-------------------------
Jim Hefferon wrote -

> I have a file that includes the test formulas for the original Boston Computer Society torture test that appeared in the Notices.  It is LaTeX but it would be easy enough to strip out the formulas.
> 
> http://joshua.smcvt.edu/bcs/

Many thanks!  This looks like it will be helpful.  I'll report back about what I find out using it.

-------------------------
Justin Bailey wrote -

> * "TeX for the Impatient"? The source for that book is distributed
> with TeX Live.
> * CWEB source for TeX. It's pretty easy to generate the TeX sources as
> I remember.
> * MetaPost is written in CWEB - you could try that (see my blog post
> http://blog.codeslower.com/2012/6/METAPOST-The-Program).

I think an entire book is not going to be helpful right now.  One small incremental step at a time is better.

When I first got my own monograph working a few weeks ago, everything looked great.  But then at 1000% magnification I noticed that there was a slight visual discrepancy in the smallest of inline fractions, where the fraction bar was too close to the numerator, even though it looked correct in larger displays.  This took three days to debug, eventually turning out to be a vertical coordinate update statement that I had misplaced from before an export to after an export.  Sigh.  Things like that never show up in log files.

> Very exciting to see more news on JSBox - I attended TUG 2014 and your
> talk was one of the highlights.

Thanks.

> If you give another talk please advertise!

Will do.

> Congrats on continuing to move forward - it looked like a monumental effort!

My head has come near to exploding several times. :)

-------------------------
Joseph Wright wrote -

> (Aside: I'm be very keen to know about

> primitive coverage beyond TeX90, particularly e-TeX, \pdfstrcmp or
> equivalent and Unicode-related primitives, in particular \Uchar and
> \Ucharcat. See expl3 for why these are important.)


These are not part of e-TeX, so I've not spent any time thinking about them.  String comparison in Unicode is a giant ball of wax, of course.

All internal data structures in JSBox traffic in 21-bit Unicode characters, either directly or as UTF-8 sequences, converted back and forth as needed.  JSBox, when enabled at run-time to handle Unicode, simply accepts any legal "character" code point, from 0 to "10FFFF (except in some cases the surrogate code values), wherever classic TeX/eTeX accepts an 8-bit character code.  So \char256 would be legal, even though in TeX/eTeX emulation mode it would generate an error message.  Same for \catcode, or other mapping commands.  JSBox supports \catcode assignment internally for all Unicode code points, and simply limits \catcode to 8-bit character values when Unicode is disabled.  So there's no need for \Uchar or \Ucharcat, which would appear to have been implemented so as not to disturb TeX's \char and \catcode implementation code.

With that said, it would be a piece of cake to implement \Uchar and \Ucharcat as primitives, should that prove necessary for some kind of compatibility that I don't understand.  Or as wrapper macros in a format library.

> I'm not clear on the 'no DVI or PDF' business: what *does* it produce
> and how does one save/print/transmit it?

On behalf of any client program to which it is linked, JSBox instantiates zero or more TeX language interpreters in the client program's memory.  The client then configures the interpreter, setting feature levels, installing hyphenation databases, telling it the default newline character(s) desired, etc.  Each interpreter exists to run zero or more jobs.  Each job consists of zero or more runs (pushing source code onto the execution stack and executing it until the stack is empty).  Typically, the first run loads a format from source code, and the second the job from a source file.  Each job creates a final set of zero or more pages in memory.  Each page records the \box255 shipped out, and all its contents.  TeX recycles all data structures after shipping a page out and forgets the page.  But JSBox saves everything (executing output nodes once when the page is recorded).  In particular, JSBox saves the page set after the job has ended, so that the still extent interpreter can export the pages (in any order) as many times as requested, up until the next job is started, when all data structures are recycled, or when the interpreter is destroyed, when all allocations are freed back to the client.

After a job is done, the client program simply asks the interpreter to export the typeset pages in some requested/supported fashion.  All writing of anything (terminal text, log files, \write files, exporting a DVI opcode stream, etc.) goes through the client program and is under its watchful eye.  The document can be "transmitted" over and over again to the client program as the user browses the document in, e.g., an online viewer.  The client program is then in charge of deciding whether or how to save the typesetting information representing the document; the whole point is that none of this should be the typesetting engine's business.  If a client program wants to create its own private document format, it can.  Or the client can use a documented interchange format like DVI or PDF or whatever.  JSBox doesn't care, although for things like DVI, it might as well implement the tools on behalf of any client program.

The client program can record and/or display the exported items (glyphs, boxes, rules, specials, etc.) on a simulated page (or writes them to a file).  If the client program wants to enable printing, it can draw the items into their anointed positions on a simulated printer page, and the system's printer driver will send it all out to a printer (that's the idea; none of that is implemented in my particular client program yet).  The point is that a client program, such as the eMonograph reader I'm working on, has the option to allow printing or copying the content, or exporting the content in some other format.

Indeed, printing my particular monograph will lose a lot of information.  Many of its mathematical illustrations are going to be dynamic, to be animated by the client program at read time, not typesetting time.  This means a particular document can morph into a computer program dedicated to displaying just that document interactively.  For several reasons, I need this capability more than I need PDF output.  That can come later (or I can just go back to using pdftex :-)

---------------------------
Karl Berry wrote -

> Doug - just ideas, sorry for not doing any research:

No problem.  All interesting suggestions I'll keep on tap. Thanks.

> - However, I fear nothing extant is going to explicitly test all the
> myriad features of TeX math typesetting in the way a test suite "should"
> (= separately, with known-good results and variations).

One of the things I've implemented in JSBox is the ability to conditionally compile in profiling code, which tells me which primitives have been called, how many times, and which have never been called.  This can help in fashioning a decent test suite.

Doug McKenna
Mathemaesthetics, Inc.




More information about the texhax mailing list