[texworks] Backward compatibility of the Qt bindings in poppler

Oliver Sander oliver.sander at tu-dresden.de
Fri Jun 11 09:09:23 CEST 2021


Hi Stefan,

thanks for the feedback!

> std::unique_ptr<Poppler::Page> page;
> {
>      page = Poppler::Document::load("1.pdf")->page(0);
> }

I see the problem.  On the other, this code snippet was buggy with the
old interface as well.  Previously, the problem was a leaking Document
object; now it is a use-after-free which is likely to lead to crashes.

Regarding your suggested solutions: My understanding of the poppler design
is too limited for an opinion here.  That is something that Albert has to
decide.  I would like to mention option 3), however, which is:
"Keep everything as it is, but document the behavior better".

As for the Q_DISABLE_COPY questions, that's for Albert, too.

Viele Grüße,
Oliver

> 
> The resulting code compiles, but is prone to segfault as the Poppler::Document goes out of scope and is destroyed at the } while the Poppler::Page lives on. As a result, some calls such as page->pageSizeF() still succeed, while others crash, such as page->label() or page->renderToImage(). The root cause, of course, is that the page seems to keep a pointer to the document to which it belongs, but this document is destroyed prematurely without the page noticing. Presumably, this problem isn't new, i.e. delete'ing a Poppler::Document* before all of its Poppler::Page's in previous versions had the same effect. But now, with std::unqiue_ptr, it's easier than ever that this could happen "accidentally" (BTW: it also doesn't seem to be documented which methods require Poppler::Document-access and which ones don't).
> 
> So far, I basically had two ideas how this could potentially be resolved:
> 1) Make Poppler::Page* be owned by Poppler::Document instead of the calling code (i.e., make Poppler::Document::page return raw pointers again instead of std::unique_ptr and "cache" the unique_ptr somewhere in Poppler::Document). The reasoning behind this would be that pages "belong" to a document, and they should be destroyed when the document is destructed.
> 2) Make the underlying data needed for all methods of Poppler::Document 
and Poppler::Page to work a std::shared_ptr (or similar). I.e., on the surface, decouple Document and Pages (they can be created and destroyed more or less independently - there are no method calls between them or pointers from one to the other), but have a "hidden state" that lives on until 
the very last potential user (Document or Page) is destroyed.
> 
> To me, 1) seems to make slightly more sense philosophically, as pages belong to their document and I don't see a good (fundamental) reason ATM for allowing pages to "outlive" their document. At the same time, 1) seems 
a lot more cumbersome; it drastically changes ownership rules; it would require page caching in Poppler::Document.
> 
> On a somewhat related note, I noticed that Poppler::Page is marked as Q_DISABLE_COPY(Page), which philosophically sort of makes sense to me, but 
seems to be trivially circumvented by calling Poppler::Document::page() multiple times as this always constructs a "new Page()" and returns a std::unique_ptr. So it would seem to me that the object can be copied quite easily (on the surface, no extensive deep cloning of meta-data seems required), it's just disabled.
> 
> Please forgive my rambling, philosophical lifetime issues are not really my strong suite. But hopefully, you can give some comments and insights 
into these matters, that maybe also help me figure out how to best approach this in TeXworks' multi-threaded wrapper code.
> 
> Best,
> Stefan
> 
> PS: I haven't looked into the other instances of unique_ptr, such as Links, Annotations, Fonts, ... yet.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 5198 bytes
Desc: S/MIME Cryptographic Signature
URL: <https://tug.org/pipermail/texworks/attachments/20210611/e630ce1a/attachment.p7s>


More information about the texworks mailing list.