Skip to content

Incorrect table extraction despite visible grid lines (nested boxes) #1357

@sjainam

Description

@sjainam

Describe the bug
pdfplumber fails to detect table grid lines when using table_settings with vertical_strategy="lines" and horizontal_strategy="lines", even though the PDF visually contains clear table borders and grid lines. As a result, tables are not detected or extracted as expected.

Have you tried repairing the PDF?

  • Yes. The issue persists even when opening the PDF using:
  • pdfplumber.open("sample.pdf", repair=True)

Code to reproduce the problem

im = pdfplumber.open("/home/jainamshah/Downloads/111.pdf").pages[0].to_image()
im.debug_tablefinder(table_settings= {"vertical_strategy": "lines","horizontal_strategy": "lines"})

PDF file

  • Attached PDF: sample.pdf
  • The PDF contains a clearly visible table with grid lines (both vertical and horizontal).
  • No selectable text is present (vector-based drawing).

Expected behavior
Since the table grid is clearly visible, pdfplumber should:

  • Detect vertical and horizontal ruling lines
  • Identify consistent intersections
  • Correctly infer table structure
  • Extract rows and columns accurately using line-based strategies
  • No cell should overlap, merge into, or appear inside another cell
Image

Actual behavior

  • Table grid lines are detected only partially or as fragmented segments
  • Intersections are inconsistent
  • extract_tables() fails or returns malformed tables
  • Line-based strategies do not produce reliable results
  • also into One box another box is wrog

Screenshots

Image

Environment

  • pdfplumber version: 0.11.9
  • Python version: 3.12.3
  • OS: Linux

Additional context
I have already raised Same type of Issue :
#175

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions