PDF extraction that checks its own work. #2 reading order accuracy — zero AI, zero GPU, zero cost.
-
Updated
Jun 10, 2026 - Python
PDF extraction that checks its own work. #2 reading order accuracy — zero AI, zero GPU, zero cost.
ODL-first PDF ingestion PoC with optional eSearch-OCR v5 repair and a Vue preview/export UI.
Production-grade internal OCR microservice — FastAPI + Dramatiq + PostgreSQL + MinIO, deterministic PDF text extraction via opendataloader-pdf in isolated subprocesses, content-hash dedup, full OTEL/Prometheus/Grafana observability, Dockerized.
Add a description, image, and links to the opendataloader topic page so that developers can more easily learn about it.
To associate your repository with the opendataloader topic, visit your repo's landing page and select "manage topics."