CICT_OCR, An Easy, - (ஒளி எழுத்துணரி)

❤️️❤️️Please star✨ it if you like❤️️❤️️

CICT-OCR, We from the Central Insitute of Classical Tamil have developed this apart from other ocr modules available, helps you recognise old-tamil manuscripts with high accuracy. This OCR is much more robust to tilted text compared to the Tesseract, Easy OCR and ocr_tamil as they are primarily built to work on the documents texts and not on manuscripts

Languages Supported 🔛

➡️ English

➡️ Tamil (தமிழ்)

Accuracy 🎯

✔️ English > 98%

✔️ Tamil > 95%

Comparison between Tesseract OCR, EasyOCR and OCR Tamil ⚖️

🏎️ 10-40% faster inference time than EasyOCR and Tesseract

OCR TAMIL 🏆	Tesseract	EasyOCR
வாழ்கவளமுடன்✅	க்‌ க்கஸாரகளள௮ஊகஎளமுடன்‌ ❌	வாழக வளமுடன்❌
தமிழ்வாழ்க✅	NO OUTPUT ❌	தமிழ்வாழ்க✅
கோபி ✅	NO OUTPUT ❌	ப99❌
தாம்பரம் ✅	NO OUTPUT ❌	தாம்பரம❌
நெடுஞ்சாலைத் ✅	NO OUTPUT ❌	நெடுஞ்சாலைத் ✅
அண்ணாசாலை ✅	NO OUTPUT ❌	ல@I9❌
ரெடிமேட்ஸ் ✅	NO OUTPUT ❌	ரெடிமேடஸ் ❌

Obtained Tesseract and EasyOCR results using the Colab notebook with Tamil and english as language

Handwritten Text (Experimental)🧪

MODEL OUTPUT: நிமிர்ந்த நன்னடை மேற்கொண்ட பார்வையும் 
நிலத்தில் யார்க் கும் அஞ்சாத நெறிகளும் 
திமிர்ந்த ஞானச் செருக்கும் இருப்பதால் 
செம்மை மாதர் திறம்புவ தில்லையாம் 
அமிழ்ந்து பேரிரு ளாமறி யாமையில் 
அவல மெய்திக் கலையின்  வாழ்வதை 
உமிழ்ந்து தள்ளுதல் பெண்ணற மாகுமாம் 
உதய கன்ன உரைப்பது கேட்டிரோ 
பாரதியார் 
ஹேமந்த் ம

How to Install and Use OCR Tamil 👨🏼‍💻

Quick links🌐

📔 Detailed explanation on Medium article.

✍️ Experiment in Colab notebook

🤗 Test it in Huggingface spaces

Pip install instructions🐍

In your command line, run the following command pip install cict_ocr

If you are using jupyter notebook , install like !pip install cict_ocr

Python Usage - Single image inference

Text Recognition only

from cict_ocr.ocr import OCR

image_path = r"test_images\1.jpg" # insert your own path here
ocr = OCR()
text_list = ocr.predict(image_path)
print(text_list[0])

## OUTPUT : நெடுஞ்சாலைத்

Text Detect + Recognition

from cict_ocr.ocr import OCR

image_path = r"test_images\0.jpg" # insert your own image path here
ocr = OCR(detect=True)
texts = ocr.predict(image_path)
print(" ".join(texts))

## OUTPUT : கொடைக்கானல் Kodaikanal

Batch inference mode 💻

Text Recognition only

from cict_ocr.ocr import OCR

image_path = [r"test_images\1.jpg",r"test_images\2.jpg"] # insert your own image paths here
ocr = OCR()
text_list = ocr.predict(image_path)

for text in text_list:
    print(text)

## OUTPUT : நெடுஞ்சாலைத்
## OUTPUT : கோபி

Text Detect + Recognition

from cict_ocr.ocr import OCR

image_path = [r"test_images\0.jpg",r"test_images\tamil_sentence.jpg"] # insert your own image paths here
ocr = OCR(detect=True)
text_list = ocr.predict(image_path)

for item in text_list:
  print(" ".join(item))
    

## OUTPUT : கொடைக்கானல் Kodaikanal 
## OUTPUT : செரியர் யற்கை மூலிகைகளில் இருந்து ஈர்த்தெடுக்க்கப்பட்ட வீரிய உட்பொருட்களை உள்ளடக்கி எந்த இரசாயன சேர்க்கைகளும் இல்லாமல் உருவாக்கப்பட்ட இந்தியாவின் முதல் சித்த தயாரிப்பு

Advanced usage🚀

OCR module can be initialized by setting following parameters as per your requirements

1. Confidence of word ->  OCR(details=1)
2. Bounding Box and Confidence of word -> OCR(detect=True,details=2)
3. To change the CRAFT Text detection settings -> OCR(detect=True,text_threshold=0.5,
                                               link_threshold=0.1,
                                               low_text=0.30)
4. To increase the Batch size of text recognition -> OCR(batch_size=16) # set as per available memory
5. To configure the language to be extracted -> OCR(lang=["tamil"]) # list can take "english" or "tamil" or both. Defaults to both language

Tested using Python 3.10 on Windows & Linux (Ubuntu 22.04) Machines

Applications⚡

ADAS system navigation based on the signboards + maps (hybrid approach) 🚁
License plate recognition 🚘

Limitations⛔

Document text reading capability is not supported as library doesn't have

➡️Auto identification of Paragraph

➡️Orientation detection

➡️Skew correction

➡️Reading order prediction

➡️Document unwarping

➡️Optimal Text detection for Document text not available

(WORKAROUND Bring your own models for above cases and use with OCR tamil for text recognition)
Unable to read the text if they are present in rotated forms

Currently supports Only Tamil Language. I don't own english model as it's taken from open source implementation of parseq

Acknowledgements 👏

Text detection - CRAFT TEXT DECTECTION

Text recognition - PARSEQ

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
cict_ocr		cict_ocr
gradio		gradio
test_images		test_images
.gitignore		.gitignore
LICENSE		LICENSE
MANIFEST.IN		MANIFEST.IN
README.md		README.md
__init__ copy.py		__init__ copy.py
pyproject.toml		pyproject.toml
setup.cfg		setup.cfg
test.py		test.py
tutorial.ipynb		tutorial.ipynb
tutorials.py		tutorials.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CICT_OCR, An Easy, - (ஒளி எழுத்துணரி)

Languages Supported 🔛

Accuracy 🎯

Comparison between Tesseract OCR, EasyOCR and OCR Tamil ⚖️

Handwritten Text (Experimental)🧪

How to Install and Use OCR Tamil 👨🏼‍💻

Quick links🌐

Pip install instructions🐍

Python Usage - Single image inference

Batch inference mode 💻

Advanced usage🚀

Applications⚡

Limitations⛔

Acknowledgements 👏

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CICT_OCR, An Easy, - (ஒளி எழுத்துணரி)

Languages Supported 🔛

Accuracy 🎯

Comparison between Tesseract OCR, EasyOCR and OCR Tamil ⚖️

Handwritten Text (Experimental)🧪

How to Install and Use OCR Tamil 👨🏼‍💻

Quick links🌐

Pip install instructions🐍

Python Usage - Single image inference

Batch inference mode 💻

Advanced usage🚀

Applications⚡

Limitations⛔

Acknowledgements 👏

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages