[texhax] Any crazy math formulas for testing a TeX language interpreter

David Carlisle d.p.carlisle at gmail.com
Tue Jan 12 09:01:23 CET 2016


On 12 January 2016 at 01:27, Douglas McKenna <doug at mathemaesthetics.com>
wrote:

> -------------------------
> David Carlisle wrote -
>
> > is there any reason not to use latex (in particular does latex.ltx not
> load?)
>
> LaTeX's code can be read and compiled, but JSBox doesn't yet have the
> ability to accurately find the external files in the/a distribution tree
> that LaTeX depends on when executed, and LaTeX tries to load quite a few.
> The problem one has as a tester is that the moment a file is not found, all
> bets are off on whether things are working "correctly"/"compatibly" later
> on.  I'm slowly wrapping my head around solving this problem (yes, I know
> about kpathsea, I just don't understand it very well).   If it's not too
> many files, I can place copies of them into the job's working directory,
> and get things read that way, at least for one particular test job.
>

Actually for the base latex tests that should work well as it is designed
to run with a sandboxed tex input path (directory) _not_ pulling in the
local tex tree, so we know what we are testing.

>
> > I ask as we've done a lot of work over the last couple of years to make
> the core latex regression test suite (that basically runs a set of tests
> and compares normalised log files against stored reference versions)  work
> with multiple engines (currently it passes with (pdf)tex, luatex and xetex
> (and has known behaviour with (u)ptex)
> > so it would be interesting to see what happens if you ran the test suite
> with jsbox.
>
> JSBox log files won't be usefully diff-able with TeX log files, except in
> a few areas.  But that's okay with me, even though it's initially a pain.
> JSBox tracing and error messages are not going to use the same format as
> TeX's.
>

luatex's aren't much teh same either, but even if they are completely
different, like any regression test suite you only need to sign off the
base results once and engine-specific logs can be stored, then it will flag
if something changes.

>
> With that said, the moment it appears that JSBox can run LaTeX
> (functionally if not literally), it will be very appropriate to pit any
> official test suite against it.  Even though (so far, as a private project)
> JSBox doesn't have to be, I really want and need it to be compatible.
>

Understood.

PS I feel I have to ask if you could change the name, which would be _very_
confusing: anyone who sees that name is going to assume it is related to
JavaScript and the css box model, but I gather it's related to neither. (I
would have assumed that even if it were not the case that there is an
existing jsbox project on github that is about that)


>
> -------------------------
> Jim Hefferon wrote -
>
> > I have a file that includes the test formulas for the original Boston
> Computer Society torture test that appeared in the Notices.  It is LaTeX
> but it would be easy enough to strip out the formulas.
> >
> > http://joshua.smcvt.edu/bcs/
>
> Many thanks!  This looks like it will be helpful.  I'll report back about
> what I find out using it.
>
> -------------------------
> Justin Bailey wrote -
>
> > * "TeX for the Impatient"? The source for that book is distributed
> > with TeX Live.
> > * CWEB source for TeX. It's pretty easy to generate the TeX sources as
> > I remember.
> > * MetaPost is written in CWEB - you could try that (see my blog post
> > http://blog.codeslower.com/2012/6/METAPOST-The-Program).
>
> I think an entire book is not going to be helpful right now.  One small
> incremental step at a time is better.
>
> When I first got my own monograph working a few weeks ago, everything
> looked great.  But then at 1000% magnification I noticed that there was a
> slight visual discrepancy in the smallest of inline fractions, where the
> fraction bar was too close to the numerator, even though it looked correct
> in larger displays.  This took three days to debug, eventually turning out
> to be a vertical coordinate update statement that I had misplaced from
> before an export to after an export.  Sigh.  Things like that never show up
> in log files.
>

If you have the analogue of \showbox that sort of thing is exactly what
show up in the latex test suite logs, they are  not just normal documents
they are designed to put information in the log file.



>
> > Very exciting to see more news on JSBox - I attended TUG 2014 and your
> > talk was one of the highlights.
>
> Thanks.
>
> > If you give another talk please advertise!
>
> Will do.
>
> > Congrats on continuing to move forward - it looked like a monumental
> effort!
>
> My head has come near to exploding several times. :)
>

I think a lot of people have been looking out for more news on this (the
system, not your head exploding:-)


>
> -------------------------
> Joseph Wright wrote -
>
> > (Aside: I'm be very keen to know about
>
> > primitive coverage beyond TeX90, particularly e-TeX, \pdfstrcmp or
> > equivalent and Unicode-related primitives, in particular \Uchar and
> > \Ucharcat. See expl3 for why these are important.)
>
>
> These are not part of e-TeX, so I've not spent any time thinking about
> them.  String comparison in Unicode is a giant ball of wax, of course.
>
> All internal data structures in JSBox traffic in 21-bit Unicode
> characters, either directly or as UTF-8 sequences, converted back and forth
> as needed.  JSBox, when enabled at run-time to handle Unicode, simply
> accepts any legal "character" code point, from 0 to "10FFFF (except in some
> cases the surrogate code values), wherever classic TeX/eTeX accepts an
> 8-bit character code.  So \char256 would be legal, even though in TeX/eTeX
> emulation mode it would generate an error message.  Same for \catcode, or
> other mapping commands.  JSBox supports \catcode assignment internally for
> all Unicode code points, and simply limits \catcode to 8-bit character
> values when Unicode is disabled.  So there's no need for \Uchar or
> \Ucharcat, which would appear to have been implemented so as not to disturb
> TeX's \char and \catcode implementation code.
>


No, etex and luatex accept \char and \catcode just as you describe but
\Uchar and \Ucharcat provide new and much needed functionality.
\char is not expandable so can not be used to generate character tokens. In
classic tex you can feasably cover the entire range with a 256 long \ifcase
(look at the latex \alph which typesets a counter as a,b,c... but for
Unicode TeX that is not feasible and so \Uchar is really essential for many
use cases.




> With that said, it would be a piece of cake to implement \Uchar and
> \Ucharcat as primitives, should that prove necessary for some kind of
> compatibility that I don't understand.  Or as wrapper macros in a format
> library.
>

As noted above, they are essential for many processing requirements for
latex or similar formats over a Unicode range.

Similarly for \mathchar where you can not fit in all the extra bits without
some syntax change, please use \Umathchar which is implemented in a
compatible way in luatex and xetex which makes things much easier for
formats like latex that have to try and work across all these engines.


>
> > I'm not clear on the 'no DVI or PDF' business: what *does* it produce
> > and how does one save/print/transmit it?
>
> On behalf of any client program to which it is linked, JSBox instantiates
> zero or more TeX language interpreters in the client program's memory.  The
> client then configures the interpreter, setting feature levels, installing
> hyphenation databases, telling it the default newline character(s) desired,
> etc.  Each interpreter exists to run zero or more jobs.  Each job consists
> of zero or more runs (pushing source code onto the execution stack and
> executing it until the stack is empty).  Typically, the first run loads a
> format from source code, and the second the job from a source file.  Each
> job creates a final set of zero or more pages in memory.  Each page records
> the \box255 shipped out, and all its contents.  TeX recycles all data
> structures after shipping a page out and forgets the page.  But JSBox saves
> everything (executing output nodes once when the page is recorded).  In
> particular, JSBox saves the page set after the job has ended, so that the
> still extent interpreter c!
>  an export the pages (in any order) as many times as requested, up until
> the next job is started, when all data structures are recycled, or when the
> interpreter is destroyed, when all allocations are freed back to the client.
>
> After a job is done, the client program simply asks the interpreter to
> export the typeset pages in some requested/supported fashion.  All writing
> of anything (terminal text, log files, \write files, exporting a DVI opcode
> stream, etc.) goes through the client program and is under its watchful
> eye.  The document can be "transmitted" over and over again to the client
> program as the user browses the document in, e.g., an online viewer.  The
> client program is then in charge of deciding whether or how to save the
> typesetting information representing the document; the whole point is that
> none of this should be the typesetting engine's business.  If a client
> program wants to create its own private document format, it can.  Or the
> client can use a documented interchange format like DVI or PDF or
> whatever.  JSBox doesn't care, although for things like DVI, it might as
> well implement the tools on behalf of any client program.
>
> The client program can record and/or display the exported items (glyphs,
> boxes, rules, specials, etc.) on a simulated page (or writes them to a
> file).  If the client program wants to enable printing, it can draw the
> items into their anointed positions on a simulated printer page, and the
> system's printer driver will send it all out to a printer (that's the idea;
> none of that is implemented in my particular client program yet).  The
> point is that a client program, such as the eMonograph reader I'm working
> on, has the option to allow printing or copying the content, or exporting
> the content in some other format.
>
> Indeed, printing my particular monograph will lose a lot of information.
> Many of its mathematical illustrations are going to be dynamic, to be
> animated by the client program at read time, not typesetting time.  This
> means a particular document can morph into a computer program dedicated to
> displaying just that document interactively.  For several reasons, I need
> this capability more than I need PDF output.  That can come later (or I can
> just go back to using pdftex :-)
>

This all sounds excellent:-)


>
> ---------------------------
> Karl Berry wrote -
>
> > Doug - just ideas, sorry for not doing any research:
>
> No problem.  All interesting suggestions I'll keep on tap. Thanks.
>
> > - However, I fear nothing extant is going to explicitly test all the
> > myriad features of TeX math typesetting in the way a test suite "should"
> > (= separately, with known-good results and variations).
>
> One of the things I've implemented in JSBox is the ability to
> conditionally compile in profiling code, which tells me which primitives
> have been called, how many times, and which have never been called.  This
> can help in fashioning a decent test suite.
>


we could usefully use that to test the coverage of the latex test suite.....


>
> Doug McKenna
> Mathemaesthetics, Inc.
>

David
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://tug.org/pipermail/texhax/attachments/20160112/d3a1f2ca/attachment-0001.html>


More information about the texhax mailing list