[texworks] Backward compatibility of the Qt bindings in poppler
Stefan Löffler
st.loeffler at gmail.com
Fri Jun 4 07:49:39 CEST 2021
Dear Oliver,
On 25.05.21 13:12, Oliver Sander wrote:
> as of Poppler commit 45717a50c52ab13f405584eab4e1c586bd39a0ce (from
> May 19),
> the Poppler Qt6 interface does now indeed return std::unique_ptr objects
> wherever appropriate.
Thanks for the heads-up. I have started looking into adapting the
TeXworks code. In doing so, the question of object lifetime came up (for
context, TeXworks uses multithreading, which doesn't make lifetime
matters any easier; however, the stuff I describe here applies also to
single-threaded code). My main concern ATM is Poppler::Page:
Take the following (constructed) snippet (no error checking for the sake
of conciseness):
std::unique_ptr<Poppler::Page> page;
{
page = Poppler::Document::load("1.pdf")->page(0);
}
The resulting code compiles, but is prone to segfault as the
Poppler::Document goes out of scope and is destroyed at the } while the
Poppler::Page lives on. As a result, some calls such as
page->pageSizeF() still succeed, while others crash, such as
page->label() or page->renderToImage(). The root cause, of course, is
that the page seems to keep a pointer to the document to which it
belongs, but this document is destroyed prematurely without the page
noticing. Presumably, this problem isn't new, i.e. delete'ing a
Poppler::Document* before all of its Poppler::Page's in previous
versions had the same effect. But now, with std::unqiue_ptr, it's easier
than ever that this could happen "accidentally" (BTW: it also doesn't
seem to be documented which methods require Poppler::Document-access and
which ones don't).
So far, I basically had two ideas how this could potentially be resolved:
1) Make Poppler::Page* be owned by Poppler::Document instead of the
calling code (i.e., make Poppler::Document::page return raw pointers
again instead of std::unique_ptr and "cache" the unique_ptr somewhere in
Poppler::Document). The reasoning behind this would be that pages
"belong" to a document, and they should be destroyed when the document
is destructed.
2) Make the underlying data needed for all methods of Poppler::Document
and Poppler::Page to work a std::shared_ptr (or similar). I.e., on the
surface, decouple Document and Pages (they can be created and destroyed
more or less independently - there are no method calls between them or
pointers from one to the other), but have a "hidden state" that lives on
until the very last potential user (Document or Page) is destroyed.
To me, 1) seems to make slightly more sense philosophically, as pages
belong to their document and I don't see a good (fundamental) reason ATM
for allowing pages to "outlive" their document. At the same time, 1)
seems a lot more cumbersome; it drastically changes ownership rules; it
would require page caching in Poppler::Document.
On a somewhat related note, I noticed that Poppler::Page is marked as
Q_DISABLE_COPY(Page), which philosophically sort of makes sense to me,
but seems to be trivially circumvented by calling
Poppler::Document::page() multiple times as this always constructs a
"new Page()" and returns a std::unique_ptr. So it would seem to me that
the object can be copied quite easily (on the surface, no extensive deep
cloning of meta-data seems required), it's just disabled.
Please forgive my rambling, philosophical lifetime issues are not really
my strong suite. But hopefully, you can give some comments and insights
into these matters, that maybe also help me figure out how to best
approach this in TeXworks' multi-threaded wrapper code.
Best,
Stefan
PS: I haven't looked into the other instances of unique_ptr, such as
Links, Annotations, Fonts, ... yet.
More information about the texworks
mailing list.