[texworks] Backward compatibility of the Qt bindings in poppler

Stefan Löffler st.loeffler at gmail.com
Fri Jun 4 07:49:39 CEST 2021

Dear Oliver,

On 25.05.21 13:12, Oliver Sander wrote:
> as of Poppler commit 45717a50c52ab13f405584eab4e1c586bd39a0ce (from 
> May 19),
> the Poppler Qt6 interface does now indeed return std::unique_ptr objects
> wherever appropriate.

Thanks for the heads-up. I have started looking into adapting the 
TeXworks code. In doing so, the question of object lifetime came up (for 
context, TeXworks uses multithreading, which doesn't make lifetime 
matters any easier; however, the stuff I describe here applies also to 
single-threaded code). My main concern ATM is Poppler::Page:

Take the following (constructed) snippet (no error checking for the sake 
of conciseness):

std::unique_ptr<Poppler::Page> page;
     page = Poppler::Document::load("1.pdf")->page(0);

The resulting code compiles, but is prone to segfault as the 
Poppler::Document goes out of scope and is destroyed at the } while the 
Poppler::Page lives on. As a result, some calls such as 
page->pageSizeF() still succeed, while others crash, such as 
page->label() or page->renderToImage(). The root cause, of course, is 
that the page seems to keep a pointer to the document to which it 
belongs, but this document is destroyed prematurely without the page 
noticing. Presumably, this problem isn't new, i.e. delete'ing a 
Poppler::Document* before all of its Poppler::Page's in previous 
versions had the same effect. But now, with std::unqiue_ptr, it's easier 
than ever that this could happen "accidentally" (BTW: it also doesn't 
seem to be documented which methods require Poppler::Document-access and 
which ones don't).

So far, I basically had two ideas how this could potentially be resolved:
1) Make Poppler::Page* be owned by Poppler::Document instead of the 
calling code (i.e., make Poppler::Document::page return raw pointers 
again instead of std::unique_ptr and "cache" the unique_ptr somewhere in 
Poppler::Document). The reasoning behind this would be that pages 
"belong" to a document, and they should be destroyed when the document 
is destructed.
2) Make the underlying data needed for all methods of Poppler::Document 
and Poppler::Page to work a std::shared_ptr (or similar). I.e., on the 
surface, decouple Document and Pages (they can be created and destroyed 
more or less independently - there are no method calls between them or 
pointers from one to the other), but have a "hidden state" that lives on 
until the very last potential user (Document or Page) is destroyed.

To me, 1) seems to make slightly more sense philosophically, as pages 
belong to their document and I don't see a good (fundamental) reason ATM 
for allowing pages to "outlive" their document. At the same time, 1) 
seems a lot more cumbersome; it drastically changes ownership rules; it 
would require page caching in Poppler::Document.

On a somewhat related note, I noticed that Poppler::Page is marked as 
Q_DISABLE_COPY(Page), which philosophically sort of makes sense to me, 
but seems to be trivially circumvented by calling 
Poppler::Document::page() multiple times as this always constructs a 
"new Page()" and returns a std::unique_ptr. So it would seem to me that 
the object can be copied quite easily (on the surface, no extensive deep 
cloning of meta-data seems required), it's just disabled.

Please forgive my rambling, philosophical lifetime issues are not really 
my strong suite. But hopefully, you can give some comments and insights 
into these matters, that maybe also help me figure out how to best 
approach this in TeXworks' multi-threaded wrapper code.


PS: I haven't looked into the other instances of unique_ptr, such as 
Links, Annotations, Fonts, ... yet.

More information about the texworks mailing list.