Skip to content

to_image() / get_page_image omits filled AcroForm widgets because PdfDocument.init_forms() is never called #1367

@Aleksandar1932

Description

@Aleksandar1932

Describe the bug

Rasterizing with page.to_image() goes through pdfplumber.display.get_page_image, which opens a pypdfium2.PdfDocument and loads the page without calling PdfDocument.init_forms(). Filled AcroForm field content is drawn via PDFium’s form layer (FPDF_FFLDraw); in pypdfium2 that only runs when a form environment exists, which init_forms() creates (and must be called after open, before loading pages).

So filled form field text can be missing from the PIL bitmap while PDF viewers still show it. This is PDFium rendering, not pdfminer text extraction (e.g. .chars).

Have you tried repairing the PDF?

Yes. pdfplumber.open(..., repair=True) does not fix this: the PDF is not treated as malformed for this path. The gap is that get_page_image never initializes the PDFium form environment before render().

Code to reproduce the problem

import pdfplumber

with pdfplumber.open("filled_form.pdf") as pdf:
    im = pdf.pages[0].to_image(resolution=150).original
    im.save("out.png")

(Also reproducible with repair=True on the same file.)

PDF file

filled_form.pdf

Expected behavior

out.png should include the visible filled field values, consistent with a typical PDF viewer, when PDFium supports the form.

Actual behavior

out.png omits the filled field text; the rest of the page rasterizes as usual.

Screenshots

If applicable, attach:

Issue:

Image

Expexted:

Image

Environment

  • pdfplumber version: 0.11.9
  • OS: macOS

Additional context

Likely fix: In get_page_image, after successfully opening pypdfium2.PdfDocument, call pdfium_doc.init_forms() before pdfium_doc.get_page(page_ix), matching pypdfium2’s documented order.

Related (not duplicate): #120 is about form values not appearing in CLI / extraction; the README documents AcroForm via pdfminer. This report is specifically about to_image() / get_page_image rasterization.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions