pdf-preprocessing

Star

Here are 2 public repositories matching this topic...

thokchomthoithoibasingh-create / OCR_Project

Star

OCR System for Extracting Text from Scanned PDF Documents using PaddleOCR and Streamlit

python ocr image-processing text-extraction pdf-to-text streamlit paddleocr document-digitization pdf-preprocessing

Updated Jun 22, 2026
Python

Node0 / morphic

Star

High-fidelity OCR + pre-RAG pipeline processor featuring: 1.) Tesseract OCR 2.) Built-in cross-line dehyphenation + real word verification 3.) Support for TIFF series, & JPEG2000 (jpx) for hi-fidelity pdf sources with logistically significant size savings. Morphic assists in pre-RAG PDF prep for analysis, large-scale ingest & agentic analysis

ocr metadata-extraction layout-analysis document-segmentation rag-ingestion pdf-preprocessing structural-parsing ai-document-ingest

Updated Jun 12, 2026
Python

Improve this page

Add a description, image, and links to the pdf-preprocessing topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the pdf-preprocessing topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pdf-preprocessing

Here are 2 public repositories matching this topic...

thokchomthoithoibasingh-create / OCR_Project

Node0 / morphic

Improve this page

Add this topic to your repo