Add a Turkish-language OCR benchmark harness

Right now we claim Turkish-focused accuracy in the README, but there are no numbers behind it. We should add a small, reproducible benchmark.

**What to build**

- Run OpenCR over a fixed set of public-domain Turkish PDFs (~50 pages total)
- Measure Word Error Rate and Character Error Rate against gold-standard transcripts
- Run the same fixtures through Tesseract, Surya, PaddleOCR, and Marker for comparison
- Publish the resulting table at `benchmarks/RESULTS.md` and link it from the README

**Where things live**

Fixtures and gold transcripts under `benchmarks/fixtures/`, the runner script under `benchmarks/run.py`, comparison tooling under `benchmarks/compare/`.

**Why**

Even informal numbers are more useful than the silence we have now. This is also a great way for new contributors to help — no model code needed, mostly careful PDF curation and a bit of scripting.

Good first issue.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a Turkish-language OCR benchmark harness #2

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Add a Turkish-language OCR benchmark harness #2

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions