[pdftex] Caching intermediate compilation results for near-real-time PDF re-renders during editing

Fri Jan 14 04:06:47 CET 2022

Hi Jim, Karl and others.

>> From: Jim Diamond via pdftex <pdftex at tug.org>
>> Date: 14 January 2022 at 12:26:27 pm AEDT
>> To: Karl Berry <karl at freefriends.org>, pdftex at tug.org
>> Subject: Re: [pdftex] Caching intermediate compilation results for near-real-time PDF re-renders during editing
>> Reply-To: Jim Diamond <jdiamond at acadiau.ca>
>> 
>> Hi all,
>> 
>> On Thu, Jan 13, 2022 at 16:36 (-0700), Karl Berry wrote:
>> 
>>> Hi Jamie - thanks for the interesting message. Thanh could say more, but
>>> FWIW, here are my reactions (in short, "sounds impossible").
>> 
>>>> My understanding of pdftex is that it operates on a per-page basis.

That statement may be true in one very limited sense only.
In typical real-world documents, anything that occurs anywhere
can have an effect on any other page of the final PDF.

>> 
>>> Plus about a million global variables representing the state of
>>> the compiler. (All the TeX engines are similar in this regard, having
>>> inherited their basic architecture from Knuth's tex.web.)
>> 
>> I recognize that you are employing hyperbole for effect here.  But
>> thinking about the OP's question, I wonder... just how many variables
>> are in play after a shipout?

A shipout of a page does *not* mean that what comes afterwards is like
a whole separate stand-alone document.

To process what comes next still relies on everything that has been setup 
earlier, in terms of how macros will expand, or even what is defined.
Think about targets of cross-references, citations, hyperlinks, etc.

There is no finite set of “variables” whose values can be saved.
You would need a snapshot of a portion of the memory,
as well as a way to sensibly make use of it.

>> 
>>>> Suppose a small change is then made to the latex source, such that the
>>>> compiler determines this change would first affect page k.
>> 
>>> I can't imagine how that could be determined without retypesetting the
>>> entire document.

Agreed.
Theoretically, it is like the Halting problem for Turing machines.
While the output is sometimes predictable, in general
you can only know what a program will do by running it.
And will it even stop? ... allowing you to examine the complete output
that it will produce?  In general, NO.

>> 
>> Currently, various editors and document viewers can do lookups and
>> reverse lookups so that one can go from a point in the source doc to
>> the corresponding line in the compiled document, and vice versa.
>> 
>>> There is no way to predict what any given change will affect.
>> 
>> Is that really true?  ***Except for multi-page paragraphs (bah!),
>> tables and similar***, is it possible for a change on page N of a
>> document to affect anything before page N-1?  

Absolutely.
That’s what the  .aux  file is typically used for.
And many packages have their own auxiliary files which write into a file, 
so that extra information is available to possibly affect any page N-k 
(for any value of  k) on the next processing run.

>> (While I can see change
>> to a word in a continued paragraph on page N changing the typesetting
>> of that paragraph, and thus possibly how page N-1 should be laid out,
>> can page N-2 be affected?)

This kind of thing is simple enough; but still not always.
If the editing forces something onto the next page, 
the overall effect just can get more and more complicated.

Besides, with pdfTeX it is not whole pages that would be best as the cached objects;
rather it would better be the individual PDF objects from which a full PDF is built.
These are already discrete objects that need a lot of book-keeping to produce
the final document. 

With “Tagged PDF” there can be hundreds (even thousands) of these that affect 
any given page.
It’s not at all clear that deciding which need to be updated, and fitting them together
with others that are unchanged, would be any more efficient or reliable than rebuilding 
the whole document from scratch (+ all the auxiliary files in their most-recent state).

>> 
>> I will admit I post this on a public forum recognizing that I may end
>> up looking like a total ignoramus.  I'll look at it as a bit of
>> "life-long learning".  :-)

I’ve been a TeX user since the late 1980s.
In that time the speed of processing has increased considerably – due mostly
to the speed of the (laptop or desktop) computer doing the processing.

Today we do things that would have been inconceivable back last century,
precisely because of the greater speed and available memory.
This growth is known as Moore’s Law – though not due to me, nor any known relative.

>> 
>>>> possible in principle because this is the basis of operation of BaKoMa
>>>> TeX, which I have used for years.
>> 
>>> That is amazing. I wonder if Malyshev's heirs can somehow be contacted
>>> to get the source freed. We have had no luck even finding notices
>>> beyond the bare fact of his death :(.
>> 
>>>> I believe very strongly in the benefits of ultra-fast recompilation

If your jobs are not compiling quickly enough for you, then the best option 
could well be to update your hardware, rather than fiddle with 
the fundamental design of the software.

>> 
>>> I agree, but can't conceive of how Basil implemented what you describe.
>>> A failure of imagination on my part, no doubt. --best, karl.

>> 
>> It does make for an interesting puzzle, don't you think?

Yes, but it is a fallacy to view the output PDF as a sequence of totally separate pages;
and to use this as the basis for document processing.

It may be like that for on-paper printing; but electronic methods have moved on, considerably. 

>> 
>> Cheers.
>> 
>>                                Jim

Hope this helps.

	Ross

Dr Ross Moore
Department of Mathematics and Statistics 
12 Wally’s Walk, Level 7, Room 734
Macquarie University, NSW 2109, Australia
T: +61 2 9850 8955  |  F: +61 2 9850 8114
M:+61 407 288 255  |  E: ross.moore at mq.edu.au
http://www.maths.mq.edu.au

CRICOS Provider Number 00002J. Think before you print. 
Please consider the environment before printing this email.

This message is intended for the addressee named and may 
contain confidential information. If you are not the intended 
recipient, please delete it and notify the sender. Views expressed 
in this message are those of the individual sender, and are not 
necessarily the views of Macquarie University. <http://mq.edu.au/>
CRICOS Provider Number 00002J. Think before you print. 
Please consider the environment before printing this email.

This message is intended for the addressee named and may 
contain confidential information. If you are not the intended 
recipient, please delete it and notify the sender. Views expressed 
in this message are those of the individual sender, and are not 
necessarily the views of Macquarie University.
 <http://mq.edu.au/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://tug.org/pipermail/pdftex/attachments/20220114/ded61380/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 4605 bytes
Desc: not available
URL: <https://tug.org/pipermail/pdftex/attachments/20220114/ded61380/attachment-0001.png>