Ground Truth issues on idp-leaderboard.org

Hello,
I was comparing some high performing models' specific results on pages where they get `0%`. I expected to see genuine failures from LLMs, but here Gemini flash performed nicely but got its score deducted due to ground truth being not that true. Please check this and others to confirm and/or re-evaluate. Am I missing something?
https://idp-leaderboard.org/explore/?model=Gemini-3-Flash&benchmark=olmocr&task=present&sample=17_pg17_pg1_text_03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ground Truth issues on idp-leaderboard.org #70

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Ground Truth issues on idp-leaderboard.org #70

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions