PDF OCR

Self-hosted, GPU-accelerated OCR web app that converts scanned or image-based PDFs into searchable, editable documents.

Upload a PDF and get back:

Searchable PDF — original layout preserved with an invisible, selectable text layer
Markdown — structured plain text with tables and images extracted
Word (.docx) — editable document ready for further editing

Built with PaddleOCR and PaddleOCR-VL for high-accuracy Chinese and multilingual document recognition. Runs entirely on your own hardware — no data leaves your network.

Features

GPU-accelerated — CUDA-powered OCR and vision-language model inference via PaddlePaddle
Multiple output formats — searchable PDF, Markdown (with images), and Word from a single upload
Batch processing — upload multiple PDFs at once; jobs run in the background
Multilingual UI — Traditional Chinese (繁體中文), Simplified Chinese (简体中文), and English
Smart Chinese text handling — automatic Simplified-to-Traditional conversion for Traditional Chinese users; Simplified Chinese users get native output
Real-time progress — live progress bar and status updates per page
Self-hosted & private — everything runs locally in Docker; your documents never leave your server
Single-container deployment — one docker compose up and you're running

Quick Start

Prerequisites: NVIDIA GPU, Docker, NVIDIA Container Toolkit

git clone https://github.com/cyanyux/pdf-ocr.git
cd pdf-ocr
docker compose up -d --build

Open http://localhost:5000 in your browser and start uploading PDFs.

First launch takes a few minutes to download model weights (~2 GB). Subsequent starts are instant.

Configuration

All settings are optional environment variables (set in docker-compose.yml):

Variable	Default	Description
`PDF_OCR_DEVICE`	`auto`	Force `cpu` to disable GPU
`PDF_OCR_CUDA_VISIBLE_DEVICES`	—	Select specific GPU (e.g. `0`)
`PDF_OCR_MAX_UPLOAD_MB`	`500`	Max upload size per file (MB)
`PDF_OCR_MODEL_IDLE_TIMEOUT`	`1800`	Seconds before unloading idle models from VRAM
`PDF_OCR_CLEANUP_INTERVAL`	`3600`	Output file cleanup interval (seconds)
`PDF_OCR_MAX_FILE_AGE`	`3600`	Delete output files older than this (seconds)
`PDF_OCR_DISABLE_HPI`	`false`	Disable HPI/ONNX GPU acceleration
`SECRET_KEY`	auto-generated	Flask secret key for CSRF

API

All endpoints accept multipart file uploads and return JSON with job IDs for async polling.

Method	Endpoint	Description
`POST`	`/api/ocr`	Convert PDF to searchable PDF
`POST`	`/api/markdown`	Convert PDF to Markdown (zip with images)
`POST`	`/api/word`	Convert PDF to Word (.docx)
`POST`	`/api/export`	Markdown + Word from single VL pass
`GET`	`/api/job/<id>`	Poll job status and progress
`POST`	`/api/cancel/<id>`	Cancel a running job
`GET`	`/api/download/...`	Download completed output
`GET`	`/api/health`	Health check (device, GPU status)

Tech Stack

OCR Engine: PaddleOCR 3.4 + PaddleOCR-VL-1.5
GPU Framework: PaddlePaddle 3.2 with CUDA 12.6
Backend: Python / Flask / Gunicorn
Frontend: Vanilla JS single-page app
Container: NVIDIA CUDA 12.6.3 + cuDNN on Ubuntu 24.04
Document conversion: Pandoc, PyMuPDF, python-docx

Troubleshooting

Check service health:

curl http://localhost:5000/api/health

Verify GPU access inside container:

docker exec pdf-ocr nvidia-smi

GPU not detected?

Ensure NVIDIA Container Toolkit is installed
Restart Docker after driver updates: sudo systemctl restart docker
Rebuild: docker compose down && docker compose up -d --build

Security

This service has no built-in authentication. Do not expose it directly to the internet. Use a reverse proxy (Nginx, Caddy, Traefik) with access control for production deployments.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
static		static
.dockerignore		.dockerignore
.gitattributes		.gitattributes
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
app.py		app.py
docker-compose.yml		docker-compose.yml
entrypoint.sh		entrypoint.sh
i18n.py		i18n.py
requirements.txt		requirements.txt
s2t_dict.py		s2t_dict.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PDF OCR

Features

Quick Start

Configuration

API

Tech Stack

Troubleshooting

Security

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PDF OCR

Features

Quick Start

Configuration

API

Tech Stack

Troubleshooting

Security

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages