[pdftex] Caching intermediate compilation results for

Ross Moore ozross at icloud.com
Sun Jan 16 01:20:56 CET 2022


Hi Jamie.

I think it is time to recall some history, from 20+ years ago.

Here’s an advert, in TUGboat from March 1999:
https://tug.org/TUGboat/tb20-1/bluesky.pdf <https://tug.org/TUGboat/tb20-1/bluesky.pdf>

The product “Lightning Textures” had 2 really great features, to make editing
easier:  Flash Mode, and Synchronicity.
these were built-in to a MacOS Classic application which handled all 3 tasks:
 Editing, Compilation with TeX, and Preview.

Back then, the Preview was using an (extended) DVI format, rather than PDF.

Flash Mode started re-compilation automatically upon keystrokes in the editing window.

Synchronicity gave a 2-way correlation between content in the Preview window
and position within the Editing window.

It was this latter feature that inspired SyncTeX, which comes with the  -synctex  option
to current tex binaries, on all platforms.
But synctex (for PDF output) has never worked as well as Synchronicity did
for Textures’ extended DVI format.

These features were the brain-child of Barry Smith, which was a founder of BlueSky Research,
but who is no longer with us. Here’s a link that may help in a search to discover more:
https://tex.stackexchange.com/questions/108497/what-happened-to-textures-and-bluesky-research <https://tex.stackexchange.com/questions/108497/what-happened-to-textures-and-bluesky-research>

There are a multitude of reasons why this work never took hold generally;
e.g.
1.  it was proprietary software
2.  output was based upon DVI, not the final PDF
3.  pdftex  was being produced around the same time
4.  Apple changed the  MacOS  operating system
5.  change of direction within BlueSky Research
6.  passing of Barry Smith


To understand why this is likely relevant to the current discussion, let’s
look at how Synchronicity worked, using the extended DVI format.

Each word in the binary DVI output was 64-bit = (2 x 32) where normal
DVI is based upon 32-bit words.  (It was 20+ years ago; maybe I’ve doubled 
the bit-lengths, but that doesn’t change the discussion.)
The extra bits of each word contained an address that indicated where
in the source document(s) that letter/character in the Preview originated.

Thus a (Ctrl-)click on a character in the Preview allowed the App to take you directly
to whereabouts in the source it had come from.
And since the App was also in control of the Editor, it could similarly go the other way.
A (Ctrl-)click in the source would take you to the exact spot in the Preview.

There’s a complication for characters coming as the expansion of macros.
Presumably the place invoking the macro-usage provides the desired target address.


For this discussion …

 ... given a sequence of snapshots (which Textures did not have), 
when editing starts one could use the Synchronicity information to identify
exactly which snapshot is the first one affected.
Then Synchronicity in the other direction can identify whereabouts, earlier
within the source, compilation needs to be started to pick up the fresh changes.
The Flash-mode aspect can then kick-in, but now having loaded everything 
to restore the state at that earlier point in the input; re-compiling from there.

Could such a strategy be used to give an up-to-date, accurate PDF?
Probably not. 
But that’s OK, as this is only meant to be giving a Preview while editing.
A full rerun is surely required anyway. An analog of Flash Mode can initiate 
this in the background, during times of non-editing.


One can ask why this kind of work has not been done already.
My points 1, 2, 4 above are relevant.
The actual coding was never made Open Source; it was OS-dependent
and required a fully-integrated Application.
Furthermore, the preferred output directly into PDF was beginning to take over.
Also, people wanted to use their own choice of Editor and viewing application;
so having a fully-integrated system was certainly not much favoured,
despite the extra benefits this could provide.

Besides, my previous point about Moore’s Law meant that people were not so 
concerned about speed any more, as things were getting much quicker anyway.


Maybe it is becoming time (20+ years later) to revisit these aspects?

It wouldn’t be hard in principle to extend pdfTeX to generate DVI output 
at each shipout, alongside building the PDF. 
TeX already knows the location of input, at least to the line-number.
There’s work to do to make this more precise, to the character say.

Tying it all together, and reacting to user-input on all operating systems,
is likely the trickiest part of this kind of approach.


Hope this helps.

	Ross


> On 16 Jan 2022, at 2:59 am, Jamie Vicary <jamie.vicary at cl.cam.ac.uk> wrote:
> 
> Hi Peter, this is a great idea.
> 
> This would rely on the assumption that latex engine reads the input
> file line-by-line as required, and does not substantially read ahead
> and store the input in memory. I don't know if this is true, but I'm
> sure many people on this list do.
> 
> The memory footprint might be quite large. But sequential snapshots
> could be stored as diffs to minimize this overhead.
> 
> Cheers,
> Jamie
> 
> On Sat, Jan 15, 2022 at 2:45 PM <selinger at mathstat.dal.ca> wrote:
>> 
>> Hi Jamie,
>> 
>> I think that what you are suggesting can almost be done at the
>> operating system level, without changing any aspect of LaTeX, except
>> to replace a few system calls.
>> 
>> What you need is the ability to
>> 
>> (1) make a snapshot of a running process (with all of its state,
>> including things like open files, content and current positions of
>> those files), and to resume the snapshot later (i.e., start a new
>> process in exactly the same state as the snapshot).
>> 
>> (2) keep track of when the process is reading from a file,
>> 
>> (3) keep track of when the process is writing to a file.
>> 
>> Basically every time the process writes to the relevant output file
>> (e.g. its main DVI or PDF output), make a note of everything that it
>> has read so far. Changes to any input files that are outside the area
>> that has currently been read cannot possibly affect the output to this
>> point.  Whenever a relevant input file changes, restart from the most
>> recent snapshot that could be affected by that change.
>> 
>> There are of course some features (such as reading the time of day)
>> that make TeX or any other program non-deterministic. These would
>> either have to be ignored or turned off.
>> 
>> If the snapshots can be made in a lightweight way (e.g., only storing
>> a "diff" from a previous snapshot), this would be relatively feasible.
>> Generally, input and output is buffered (i.e., written in much larger
>> chunks than necessary), so instead of merely replacing system calls
>> such are write(), it might be necessary to adjust some library
>> functions such as fwrite().
>> 
>> As others have noted, LaTeX also writes a bunch of auxiliary files
>> (.aux file, table of contents, etc), which affect the next pass over
>> the document, but one could choose to ignore these unless a full
>> recompile is requested.
>> 
>> The advantage is that this solution would work for practically any
>> program that reads input to produce incremental output; it is not
>> LaTeX specific.
>> 
>> -- Peter
>> 
>> Jamie Vicary wrote:
>>> 
>>> Hi Jim, Ross and others, thanks for your further comments.
>>> 
>>> I certainly agree that some parts of the document have substantially
>>> nonlocal effects, e.g. references, citations, indexing, and similar. I
>>> should have made clear, any functionality that ordinarily requires
>>> more than one pass of pdflatex is completely outside the scope of what
>>> I am suggesting. (This is the behaviour of BaKoMa TeX -- if you want
>>> your equation references etc to update, you have to fire off a full
>>> recompile.)
>>> 
>>>> I would guess the development effort to
>>>> do this would be considerable for someone who thoroughly knows the
>>>> internals of the TeX engine, and perhaps a tremendous effort for
>>>> someone starting from scratch.
>>> 
>>> I don't know anything about TeX, but I can develop code, and this
>>> feature is sufficiently important to my working style, that I would
>>> potentially be interested to take it on as a project, particularly if
>>> others could be involved.
>>> 
>>>> what I
>>>> described may be (far?) less than BaKoMa TeX does anyway; the author
>>>> of that had undoubtedly given it a *lot* more thought than me.
>>> 
>>> BaKoMa has its own issues, and is of course no longer being developed.
>>> At some point it will become incompatible with modern latex packages.
>>> 
>>> I think it would be great to have this fast-recompile feature in the
>>> open-source world. It doesn't matter if it has limitations at first,
>>> people will improve it over time.
>>> 
>>>> If your jobs are not compiling quickly enough for you, then the best option could well be to update your hardware, rather than fiddle with the fundamental design of the software.
>>> 
>>> My hardware is very good thank you! :) I of course understand many
>>> people might think this way.
>>> 
>>> For 10 years, I have worked on my latex documents with my code on the
>>> left side of my screen, and the BaKoMa preview on the right, giving me
>>> ultra-fast updating at the level of individual keystrokes (as long as
>>> the document is in a syntactically valid state.) These can be large
>>> documents, taking minutes for a full recompilation, and I am often
>>> making minute adjustments to graphical styles, tikz diagrams, etc. I
>>> am **multiple times** more productive with this system than I would be
>>> otherwise. I completely avoid the inconveniences of having to split my
>>> sections and figures into different files, and all those workarounds,
>>> and I must have saved cumulative weeks of my life waiting for
>>> documents to compile.
>>> 
>>> Cheers,
>>> Jamie
>>> 
>>> 
>>> On Fri, Jan 14, 2022 at 5:01 PM Jim Diamond <jdiamond at acadiau.ca> wrote:
>>>> 
>>>> Ross,
>>>> 
>>>> It seems your mail program clobbers the quoting in the plain text
>>>> part.
>>>> 
>>>> All,
>>>> 
>>>> At the cost of incorrect quoting below, I'll carry on with the email as-is.
>>>> 
>>>> 
>>>> On Fri, Jan 14, 2022 at 14:06 (+1100), Ross Moore wrote:
>>>> 
>>>>> Hi Jim, Karl and others.
>>>> 
>>>>> From: Jim Diamond via pdftex <pdftex at tug.org<mailto:pdftex at tug.org>>
>>>>> Date: 14 January 2022 at 12:26:27 pm AEDT
>>>>> To: Karl Berry <karl at freefriends.org<mailto:karl at freefriends.org>>, pdftex at tug.org<mailto:pdftex at tug.org>
>>>>> Subject: Re: [pdftex] Caching intermediate compilation results for near-real-time PDF re-renders during editing
>>>>> Reply-To: Jim Diamond <jdiamond at acadiau.ca<mailto:jdiamond at acadiau.ca>>
>>>> 
>>>>> Hi all,
>>>> 
>>>>> On Thu, Jan 13, 2022 at 16:36 (-0700), Karl Berry wrote:
>>>> 
>>>>> Hi Jamie - thanks for the interesting message. Thanh could say more, but
>>>>> FWIW, here are my reactions (in short, "sounds impossible").
>>>> 
>>>>> That statement may be true in one very limited sense only.
>>>>> In typical real-world documents, anything that occurs anywhere
>>>>> can have an effect on any other page of the final PDF.
>>>> 
>>>> This is true.  But see below.
>>>> 
>>>>> I recognize that you are employing hyperbole for effect here.  But
>>>>> thinking about the OP's question, I wonder... just how many variables
>>>>> are in play after a shipout?
>>>> 
>>>>> A shipout of a page does *not* mean that what comes afterwards is like
>>>>> a whole separate stand-alone document.
>>>> 
>>>>> To process what comes next still relies on everything that has been setup
>>>>> earlier, in terms of how macros will expand, or even what is defined.
>>>>> Think about targets of cross-references, citations, hyperlinks, etc.
>>>> 
>>>> Good point.
>>>> 
>>>>> There is no finite set of “variables” whose values can be saved.
>>>> 
>>>> Surely the collection of registers, macros and other objects
>>>> defining the state of the computation after a shipout is finite.
>>>> 
>>>>> You would need a snapshot of a portion of the memory,
>>>>> as well as a way to sensibly make use of it.
>>>> 
>>>> Which gets us back to the OP's question.
>>>> 
>>>> 
>>>>> Suppose a small change is then made to the latex source, such that the
>>>>> compiler determines this change would first affect page k.
>>>> 
>>>>> I can't imagine how that could be determined without retypesetting the
>>>>> entire document.
>>>> 
>>>>> Agreed.
>>>>> Theoretically, it is like the Halting problem for Turing machines.
>>>>> While the output is sometimes predictable, in general
>>>>> you can only know what a program will do by running it.
>>>>> And will it even stop? ... allowing you to examine the complete output
>>>>> that it will produce?  In general, NO.
>>>> 
>>>> I'd argue that solving the halting problem is sufficient but not
>>>> strictly necessary.
>>>> 
>>>>> Currently, various editors and document viewers can do lookups and
>>>>> reverse lookups so that one can go from a point in the source doc to
>>>>> the corresponding line in the compiled document, and vice versa.
>>>> 
>>>>> There is no way to predict what any given change will affect.
>>>> 
>>>> Perhaps not,  reports of the features of BaKoMa TeX (which I have
>>>> never used) notwithstanding.
>>>> 
>>>>> Is that really true?  ***Except for multi-page paragraphs (bah!),
>>>>> tables and similar***, is it possible for a change on page N of a
>>>>> document to affect anything before page N-1?
>>>> 
>>>>> Absolutely.
>>>>> That’s what the  .aux  file is typically used for.
>>>>> And many packages have their own auxiliary files which write into a file,
>>>>> so that extra information is available to possibly affect any page N-k
>>>>> (for any value of  k) on the next processing run.
>>>> 
>>>> It is true that if a change to the source causes the .aux (and
>>>> similar) files to be changed, that any/all of the document might be
>>>> changed the next time it is compiled.
>>>> 
>>>> But should we conclude that we can't do anything useful here,
>>>> following the OP's question?  I think we can.  (See below.)
>>>> 
>>>> 
>>>>> (While I can see change
>>>>> to a word in a continued paragraph on page N changing the typesetting
>>>>> of that paragraph, and thus possibly how page N-1 should be laid out,
>>>>> can page N-2 be affected?)
>>>> 
>>>>> This kind of thing is simple enough; but still not always.
>>>>> If the editing forces something onto the next page,
>>>>> the overall effect just can get more and more complicated.
>>>> 
>>>> <snip>
>>>> 
>>>> 
>>>>> I’ve been a TeX user since the late 1980s.
>>>>> In that time the speed of processing has increased considerably – due mostly
>>>>> to the speed of the (laptop or desktop) computer doing the processing.
>>>> 
>>>>> Today we do things that would have been inconceivable back last
>>>>> century, precisely because of the greater speed and available
>>>>> memory.  This growth is known as Moore’s Law – though not due to me,
>>>>> nor any known relative.
>>>> 
>>>> <snip>
>>>> 
>>>>> I believe very strongly in the benefits of ultra-fast recompilation
>>>> 
>>>>> If your jobs are not compiling quickly enough for you, then the best option
>>>>> could well be to update your hardware, rather than fiddle with
>>>>> the fundamental design of the software.
>>>> 
>>>> I've been a TeX user since the early 1980's.  Yes, things have sped up
>>>> considerably.  However, to be blunt, I think the suggestion "get
>>>> faster hardware" is a bit on the obnoxious side.  While the OP may
>>>> have more money than God, I have heard that there are lots of people
>>>> on the planet with limited financial means, and some of them may have
>>>> to do their computing with a low-end Raspberry Pi or even something
>>>> less capable.  (And, IMHO, people buying into the "just get
>>>> more/faster hardware" mantra is why we need 8 GB of RAM to look at a
>>>> web page.)
>>>> 
>>>> 
>>>> Anyway, for what it's worth here is a thought of how compilation could
>>>> be sped up to help someone quickly preview their documents.
>>>> 
>>>> There could be two types of "re-compilation":
>>>> (1) A full (re-)compilation, perhaps running pdf(la)tex the usual
>>>>    number of times to ensure all the ToC entries, cross references,
>>>>    and so on are done correctly.
>>>>    These runs (only the last one is really relevant) could save
>>>>    whatever computation state is needed at the end of each page.
>>>>    Further, similar to synctex, it could record a correspondence
>>>>    between source locations and PDF locations.
>>>> (2) A "best-effort", "fast" re-compilation could look at where in the
>>>>    source the first change is since the last "full" (re-)compilation;
>>>>    the editor would have to keep track of this.  Suppose the point of
>>>>    change was found at page N in the most recent full re-compilation.
>>>>    Recognizing that this change *might* have affected previous pages,
>>>>    it could boldly carry on and load the state of the universe at the
>>>>    end of page N-1 and then carry on the compilation from there.
>>>> 
>>>> This best-effort compilation would allow the user to quickly see how
>>>> local changes look, at the cost of the user needing to recognize that
>>>> a full re-compilation may change things dramatically.  But in many
>>>> cases, this might be enough to make a user happy.
>>>> 
>>>> Jamie: having said all that, I would guess the development effort to
>>>> do this would be considerable for someone who thoroughly knows the
>>>> internals of the TeX engine, and perhaps a tremendous effort for
>>>> someone starting from scratch. I'd further guess that Unless some grad
>>>> student out there finds this to be an interesting project, and his/her
>>>> supervisor thinks it can be turned into a thesis and/or publishable
>>>> material, I'm not sure I see this happening, even though what I
>>>> described may be (far?) less than BaKoMa TeX does anyway; the author
>>>> of that had undoubtedly given it a *lot* more thought than me.
>>>> 
>>>> 
>>>> Cheers.
>>>>                                Jim
>>> 
>> 


Dr Ross Moore
Department of Mathematics and Statistics 
12 Wally’s Walk, Level 7, Room 734
Macquarie University, NSW 2109, Australia
T: +61 2 9850 8955  |  F: +61 2 9850 8114
M:+61 407 288 255  |  E: ross.moore at mq.edu.au
http://www.maths.mq.edu.au

CRICOS Provider Number 00002J. Think before you print. 
Please consider the environment before printing this email.

This message is intended for the addressee named and may 
contain confidential information. If you are not the intended 
recipient, please delete it and notify the sender. Views expressed 
in this message are those of the individual sender, and are not 
necessarily the views of Macquarie University. <http://mq.edu.au/>
CRICOS Provider Number 00002J. Think before you print. 
Please consider the environment before printing this email.

This message is intended for the addressee named and may 
contain confidential information. If you are not the intended 
recipient, please delete it and notify the sender. Views expressed 
in this message are those of the individual sender, and are not 
necessarily the views of Macquarie University.
 <http://mq.edu.au/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://tug.org/pipermail/pdftex/attachments/20220116/05184cf0/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 4605 bytes
Desc: not available
URL: <https://tug.org/pipermail/pdftex/attachments/20220116/05184cf0/attachment-0001.png>


More information about the pdftex mailing list.