Integrate synthetic data for text recognition training

The main improvement needed for Ocrs to be more useful is higher text recognition accuracy / lower error rate, especially with longer lines. Also for multilingual support, examples in more languages will be needed. The main plan to improve this is to expand the training data with synthetic images. There are a number of existing text generation projects that might be useful:

1. https://github.com/ankush-me/SynthText
2. https://github.com/Belval/TextRecognitionDataGenerator ([forked here](https://github.com/robertknight/TextRecognitionDataGenerator) to add Pillow v10 support)
3. https://github.com/clovaai/synthtiger


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integrate synthetic data for text recognition training #10

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Integrate synthetic data for text recognition training #10

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions