Skip to content

Fix PdfAnnotation soundness by keeping parent page alive#206

Open
JustForFun88 wants to merge 2 commits intomessense:mainfrom
JustForFun88:fix/annotation-soundness
Open

Fix PdfAnnotation soundness by keeping parent page alive#206
JustForFun88 wants to merge 2 commits intomessense:mainfrom
JustForFun88:fix/annotation-soundness

Conversation

@JustForFun88
Copy link
Copy Markdown
Contributor

Closes #205

PdfAnnotation held a raw *mut pdf_annot without preventing the parent page from being freed. Since MuPDF's returns pdf_annot->page that is a borrowed (non-ref-counted) back-pointer, dropping the PdfPage while annotations were still alive caused use-after-free.

Changes:

  • PdfAnnotation now stores a ref-counted NonNull<pdf_page> that keeps the parent page alive
  • Added from_raw_keep_ref for borrowed annotation pointers (used by the iterator)
  • AnnotationIter keeps the page alive and uses correct refcounting

Added 3 soundness tests

@JustForFun88 JustForFun88 changed the title Fix PdfAnnotation soundness by keeping parent page alive Fix PdfAnnotation soundness by keeping parent page alive Mar 26, 2026
@messense messense requested a review from Copilot March 26, 2026 09:33
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a soundness issue where PdfAnnotation could outlive (and then dereference) its parent PdfPage, leading to use-after-free due to MuPDF’s non-refcounted back-pointer from pdf_annot to pdf_page.

Changes:

  • Make PdfAnnotation hold a ref-counted pdf_page pointer to keep the parent page alive.
  • Add from_raw_keep_ref and update AnnotationIter to handle borrowed annotation pointers with correct refcounting while keeping the page alive.
  • Add unit tests that exercise annotation properties, iteration-after-page-drop, deletion, and cross-page behavior.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.

File Description
src/pdf/annotation.rs PdfAnnotation now owns a ref-counted page handle and adds from_raw_keep_ref for borrowed pointers.
src/pdf/page.rs AnnotationIter now retains the page and yields annotations with correct refcount semantics.
src/pdf/mod.rs Registers the new annotation test module under #[cfg(test)].
src/pdf/tests_annotation.rs Adds soundness and behavior tests for annotation creation, iteration, and deletion.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@JustForFun88
Copy link
Copy Markdown
Contributor Author

@messense could you please rerun the CI workflow again? Looks like GitHub issue again

Comment thread src/pdf/annotation.rs
pdf_keep_page(context(), page);
Self {
inner: NonNull::new_unchecked(ptr),
page: NonNull::new_unchecked(page),
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason you're storing the page pointer instead of retrieving it again in the Drop method using pdf_annot_page? Like this the page pointer is stored twice, once here and once inside the annotation.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is stored twice, in the annotation iterator and here, to ensure that both the iterator and the PdfAnnotation outlive the page.

We could potentially use a lifetime in AnnotationIter to reduce the number of page reference counts. However, using pdf_keep_page only in the iterator and relying on a lifetime for PdfAnnotation is not possible at the moment, since that would require lending iterators, which Rust does not support yet.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't counting the iterator. I meant that it's stored on the Rust and on the C side.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it is written in C code itself pdf_new_annot - only borrowed, as the page owns the annot:

static pdf_annot *
pdf_new_annot(fz_context *ctx, pdf_page *page, pdf_obj *obj)
{
	pdf_annot *annot;

	annot = fz_malloc_struct(ctx, pdf_annot);
	annot->refs = 1;
	annot->page = page; /* only borrowed, as the page owns the annot */
	annot->obj = pdf_keep_obj(ctx, obj);

	return annot;
}

And if you look at pdf_drop_annot they dropped only annotation object, page is not dropped:

void
pdf_drop_annot(fz_context *ctx, pdf_annot *annot)
{
	if (fz_drop_imp(ctx, annot, &annot->refs))
	{
		pdf_drop_obj(ctx, annot->obj);
		fz_free(ctx, annot);
	}
}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm aware, but the C side has a pointer to the page. My idea looks like this:

pub(crate) unsafe fn from_raw(ptr: *mut pdf_annot) -> Self {
        let page = pdf_annot_page(context(), ptr);
        pdf_keep_page(context(), page);
        Self {
            inner: NonNull::new_unchecked(ptr),
        }

and

impl Drop for PdfAnnotation {
    fn drop(&mut self) {
        unsafe {
            let page = pdf_annot_page(context(), self.inner.as_ptr());
            pdf_drop_annot(context(), self.inner.as_ptr());
            pdf_drop_page(context(), page);
        }
    }
}

Copy link
Copy Markdown
Contributor Author

@JustForFun88 JustForFun88 Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not work, at least with the current rust API. pdf_delete_annot clears the page pointer:

// pdf-annot.c:1151-1154
/* Remove annot from page's list */
*annotptr = annot->next;

/* Annot may no longer borrow page pointer, since they are separated */
annot->page = NULL;

So after deletion, pdf_annot_page(context(), self.inner.as_ptr()) returns NULL, which causes the memory leak (for page).

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the annotation doesn't belong to the page anymore after deleting it, shouldn't the page pointer be dropped as well in the delete_annotation method? I'm also not quite sure if there is a reason it takes a reference to an annotation, instead of an owned annotation since the delete method drops the annotation pointer.

Comment thread src/pdf/page.rs Outdated
@JustForFun88
Copy link
Copy Markdown
Contributor Author

@ginnyTheCat, @itsjunetime, @messense

The suggestion for AnnotationIter<'a> is implemented. I also agree with @ginnyTheCatdelete_annotation now takes PdfAnnotation by value.

However, I'd prefer to keep the page field in PdfAnnotation. We could save 8 bytes per object, but it makes the code unnecessarily fragile. Since pdf_delete_annot sets annot->page = NULL, not having our own page field, working around it would force us to write something like this:

pub fn delete_annotation(&mut self, annot: PdfAnnotation) -> Result<(), Error> {
    let annot = ManuallyDrop::new(annot);
    unsafe {
        // Must capture the page pointer *before* C nulls it
        let page = pdf_annot_page(context(), annot.inner.as_ptr());
        ffi_try!(mupdf_pdf_delete_annot(
            context(),
            self.as_mut_ptr(),
            annot.inner.as_ptr()
        ))?;
        pdf_drop_annot(context(), annot.inner.as_ptr());
        pdf_drop_page(context(), page);
    }
    Ok(())
}

While this works, it creates an invisible invariant: the drop order matters, and you must capture the page pointer before the FFI call. Future maintainers would have to remember this sequence with no compiler help.

Removing the field from PdfAnnotation saves only 8 bytes. For comparison, a String on the stack takes up 24 bytes. In practice, a single page doesn't have thousands of annotations, so the memory difference is negligible. I don't think the space saved justifies the added fragility.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Annotation test uses a PDF without annotations, while appropriate test results in segfault

4 participants