[pdftex] Caching intermediate compilation results for

Jamie Vicary jamie.vicary at cl.cam.ac.uk
Sat Jan 15 16:59:57 CET 2022


Hi Peter, this is a great idea.

This would rely on the assumption that latex engine reads the input
file line-by-line as required, and does not substantially read ahead
and store the input in memory. I don't know if this is true, but I'm
sure many people on this list do.

The memory footprint might be quite large. But sequential snapshots
could be stored as diffs to minimize this overhead.

Cheers,
Jamie

On Sat, Jan 15, 2022 at 2:45 PM <selinger at mathstat.dal.ca> wrote:
>
> Hi Jamie,
>
> I think that what you are suggesting can almost be done at the
> operating system level, without changing any aspect of LaTeX, except
> to replace a few system calls.
>
> What you need is the ability to
>
> (1) make a snapshot of a running process (with all of its state,
> including things like open files, content and current positions of
> those files), and to resume the snapshot later (i.e., start a new
> process in exactly the same state as the snapshot).
>
> (2) keep track of when the process is reading from a file,
>
> (3) keep track of when the process is writing to a file.
>
> Basically every time the process writes to the relevant output file
> (e.g. its main DVI or PDF output), make a note of everything that it
> has read so far. Changes to any input files that are outside the area
> that has currently been read cannot possibly affect the output to this
> point.  Whenever a relevant input file changes, restart from the most
> recent snapshot that could be affected by that change.
>
> There are of course some features (such as reading the time of day)
> that make TeX or any other program non-deterministic. These would
> either have to be ignored or turned off.
>
> If the snapshots can be made in a lightweight way (e.g., only storing
> a "diff" from a previous snapshot), this would be relatively feasible.
> Generally, input and output is buffered (i.e., written in much larger
> chunks than necessary), so instead of merely replacing system calls
> such are write(), it might be necessary to adjust some library
> functions such as fwrite().
>
> As others have noted, LaTeX also writes a bunch of auxiliary files
> (.aux file, table of contents, etc), which affect the next pass over
> the document, but one could choose to ignore these unless a full
> recompile is requested.
>
> The advantage is that this solution would work for practically any
> program that reads input to produce incremental output; it is not
> LaTeX specific.
>
> -- Peter
>
> Jamie Vicary wrote:
> >
> > Hi Jim, Ross and others, thanks for your further comments.
> >
> > I certainly agree that some parts of the document have substantially
> > nonlocal effects, e.g. references, citations, indexing, and similar. I
> > should have made clear, any functionality that ordinarily requires
> > more than one pass of pdflatex is completely outside the scope of what
> > I am suggesting. (This is the behaviour of BaKoMa TeX -- if you want
> > your equation references etc to update, you have to fire off a full
> > recompile.)
> >
> > > I would guess the development effort to
> > > do this would be considerable for someone who thoroughly knows the
> > > internals of the TeX engine, and perhaps a tremendous effort for
> > > someone starting from scratch.
> >
> > I don't know anything about TeX, but I can develop code, and this
> > feature is sufficiently important to my working style, that I would
> > potentially be interested to take it on as a project, particularly if
> > others could be involved.
> >
> > > what I
> > > described may be (far?) less than BaKoMa TeX does anyway; the author
> > > of that had undoubtedly given it a *lot* more thought than me.
> >
> > BaKoMa has its own issues, and is of course no longer being developed.
> > At some point it will become incompatible with modern latex packages.
> >
> > I think it would be great to have this fast-recompile feature in the
> > open-source world. It doesn't matter if it has limitations at first,
> > people will improve it over time.
> >
> > > If your jobs are not compiling quickly enough for you, then the best option could well be to update your hardware, rather than fiddle with the fundamental design of the software.
> >
> > My hardware is very good thank you! :) I of course understand many
> > people might think this way.
> >
> > For 10 years, I have worked on my latex documents with my code on the
> > left side of my screen, and the BaKoMa preview on the right, giving me
> > ultra-fast updating at the level of individual keystrokes (as long as
> > the document is in a syntactically valid state.) These can be large
> > documents, taking minutes for a full recompilation, and I am often
> > making minute adjustments to graphical styles, tikz diagrams, etc. I
> > am **multiple times** more productive with this system than I would be
> > otherwise. I completely avoid the inconveniences of having to split my
> > sections and figures into different files, and all those workarounds,
> > and I must have saved cumulative weeks of my life waiting for
> > documents to compile.
> >
> > Cheers,
> > Jamie
> >
> >
> > On Fri, Jan 14, 2022 at 5:01 PM Jim Diamond <jdiamond at acadiau.ca> wrote:
> > >
> > > Ross,
> > >
> > > It seems your mail program clobbers the quoting in the plain text
> > > part.
> > >
> > > All,
> > >
> > > At the cost of incorrect quoting below, I'll carry on with the email as-is.
> > >
> > >
> > > On Fri, Jan 14, 2022 at 14:06 (+1100), Ross Moore wrote:
> > >
> > > > Hi Jim, Karl and others.
> > >
> > > > From: Jim Diamond via pdftex <pdftex at tug.org<mailto:pdftex at tug.org>>
> > > > Date: 14 January 2022 at 12:26:27 pm AEDT
> > > > To: Karl Berry <karl at freefriends.org<mailto:karl at freefriends.org>>, pdftex at tug.org<mailto:pdftex at tug.org>
> > > > Subject: Re: [pdftex] Caching intermediate compilation results for near-real-time PDF re-renders during editing
> > > > Reply-To: Jim Diamond <jdiamond at acadiau.ca<mailto:jdiamond at acadiau.ca>>
> > >
> > > > Hi all,
> > >
> > > > On Thu, Jan 13, 2022 at 16:36 (-0700), Karl Berry wrote:
> > >
> > > > Hi Jamie - thanks for the interesting message. Thanh could say more, but
> > > > FWIW, here are my reactions (in short, "sounds impossible").
> > >
> > > > That statement may be true in one very limited sense only.
> > > > In typical real-world documents, anything that occurs anywhere
> > > > can have an effect on any other page of the final PDF.
> > >
> > > This is true.  But see below.
> > >
> > > > I recognize that you are employing hyperbole for effect here.  But
> > > > thinking about the OP's question, I wonder... just how many variables
> > > > are in play after a shipout?
> > >
> > > > A shipout of a page does *not* mean that what comes afterwards is like
> > > > a whole separate stand-alone document.
> > >
> > > > To process what comes next still relies on everything that has been setup
> > > > earlier, in terms of how macros will expand, or even what is defined.
> > > > Think about targets of cross-references, citations, hyperlinks, etc.
> > >
> > > Good point.
> > >
> > > > There is no finite set of “variables” whose values can be saved.
> > >
> > > Surely the collection of registers, macros and other objects
> > > defining the state of the computation after a shipout is finite.
> > >
> > > > You would need a snapshot of a portion of the memory,
> > > > as well as a way to sensibly make use of it.
> > >
> > > Which gets us back to the OP's question.
> > >
> > >
> > > > Suppose a small change is then made to the latex source, such that the
> > > > compiler determines this change would first affect page k.
> > >
> > > > I can't imagine how that could be determined without retypesetting the
> > > > entire document.
> > >
> > > > Agreed.
> > > > Theoretically, it is like the Halting problem for Turing machines.
> > > > While the output is sometimes predictable, in general
> > > > you can only know what a program will do by running it.
> > > > And will it even stop? ... allowing you to examine the complete output
> > > > that it will produce?  In general, NO.
> > >
> > > I'd argue that solving the halting problem is sufficient but not
> > > strictly necessary.
> > >
> > > > Currently, various editors and document viewers can do lookups and
> > > > reverse lookups so that one can go from a point in the source doc to
> > > > the corresponding line in the compiled document, and vice versa.
> > >
> > > > There is no way to predict what any given change will affect.
> > >
> > > Perhaps not,  reports of the features of BaKoMa TeX (which I have
> > > never used) notwithstanding.
> > >
> > > > Is that really true?  ***Except for multi-page paragraphs (bah!),
> > > > tables and similar***, is it possible for a change on page N of a
> > > > document to affect anything before page N-1?
> > >
> > > > Absolutely.
> > > > That’s what the  .aux  file is typically used for.
> > > > And many packages have their own auxiliary files which write into a file,
> > > > so that extra information is available to possibly affect any page N-k
> > > > (for any value of  k) on the next processing run.
> > >
> > > It is true that if a change to the source causes the .aux (and
> > > similar) files to be changed, that any/all of the document might be
> > > changed the next time it is compiled.
> > >
> > > But should we conclude that we can't do anything useful here,
> > > following the OP's question?  I think we can.  (See below.)
> > >
> > >
> > > > (While I can see change
> > > > to a word in a continued paragraph on page N changing the typesetting
> > > > of that paragraph, and thus possibly how page N-1 should be laid out,
> > > > can page N-2 be affected?)
> > >
> > > > This kind of thing is simple enough; but still not always.
> > > > If the editing forces something onto the next page,
> > > > the overall effect just can get more and more complicated.
> > >
> > > <snip>
> > >
> > >
> > > > I’ve been a TeX user since the late 1980s.
> > > > In that time the speed of processing has increased considerably – due mostly
> > > > to the speed of the (laptop or desktop) computer doing the processing.
> > >
> > > > Today we do things that would have been inconceivable back last
> > > > century, precisely because of the greater speed and available
> > > > memory.  This growth is known as Moore’s Law – though not due to me,
> > > > nor any known relative.
> > >
> > > <snip>
> > >
> > > > I believe very strongly in the benefits of ultra-fast recompilation
> > >
> > > > If your jobs are not compiling quickly enough for you, then the best option
> > > > could well be to update your hardware, rather than fiddle with
> > > > the fundamental design of the software.
> > >
> > > I've been a TeX user since the early 1980's.  Yes, things have sped up
> > > considerably.  However, to be blunt, I think the suggestion "get
> > > faster hardware" is a bit on the obnoxious side.  While the OP may
> > > have more money than God, I have heard that there are lots of people
> > > on the planet with limited financial means, and some of them may have
> > > to do their computing with a low-end Raspberry Pi or even something
> > > less capable.  (And, IMHO, people buying into the "just get
> > > more/faster hardware" mantra is why we need 8 GB of RAM to look at a
> > > web page.)
> > >
> > >
> > > Anyway, for what it's worth here is a thought of how compilation could
> > > be sped up to help someone quickly preview their documents.
> > >
> > > There could be two types of "re-compilation":
> > > (1) A full (re-)compilation, perhaps running pdf(la)tex the usual
> > >     number of times to ensure all the ToC entries, cross references,
> > >     and so on are done correctly.
> > >     These runs (only the last one is really relevant) could save
> > >     whatever computation state is needed at the end of each page.
> > >     Further, similar to synctex, it could record a correspondence
> > >     between source locations and PDF locations.
> > > (2) A "best-effort", "fast" re-compilation could look at where in the
> > >     source the first change is since the last "full" (re-)compilation;
> > >     the editor would have to keep track of this.  Suppose the point of
> > >     change was found at page N in the most recent full re-compilation.
> > >     Recognizing that this change *might* have affected previous pages,
> > >     it could boldly carry on and load the state of the universe at the
> > >     end of page N-1 and then carry on the compilation from there.
> > >
> > > This best-effort compilation would allow the user to quickly see how
> > > local changes look, at the cost of the user needing to recognize that
> > > a full re-compilation may change things dramatically.  But in many
> > > cases, this might be enough to make a user happy.
> > >
> > > Jamie: having said all that, I would guess the development effort to
> > > do this would be considerable for someone who thoroughly knows the
> > > internals of the TeX engine, and perhaps a tremendous effort for
> > > someone starting from scratch. I'd further guess that Unless some grad
> > > student out there finds this to be an interesting project, and his/her
> > > supervisor thinks it can be turned into a thesis and/or publishable
> > > material, I'm not sure I see this happening, even though what I
> > > described may be (far?) less than BaKoMa TeX does anyway; the author
> > > of that had undoubtedly given it a *lot* more thought than me.
> > >
> > >
> > > Cheers.
> > >                                 Jim
> >
>



More information about the pdftex mailing list.