First off, thanks for this library, it sped up the OCR in my code by 8-10x relative to pytesseract (which I was using previously).
I've noticed a couple of bugs in the (0.1.0) code though:
- Line 262 raised an
AttributeError for me so I had to change it (I just used dirname(abspath(__file__)) instead, which sets the working dir to where pytessy is installed).
justread_raw calls get_text, I think it should be calling get_text_raw instead.
For reference and for future users, I'm leaving my code here as a sample:
from pytessy.pytessy import PyTessy
ocr = PyTessy(r'C:\Program Files\Tesseract-OCR\tesseract')
bytes = img.tobytes() # img is a numpy array, such as from np.array(PIL.Image.open(file))
h, w = img.shape[:2]
bpp = len(bytes) // (w * h)
txt = ocr.read(bytes, w, h, bpp)
First off, thanks for this library, it sped up the OCR in my code by 8-10x relative to pytesseract (which I was using previously).
I've noticed a couple of bugs in the (0.1.0) code though:
AttributeErrorfor me so I had to change it (I just useddirname(abspath(__file__))instead, which sets the working dir to where pytessy is installed).justread_rawcallsget_text, I think it should be callingget_text_rawinstead.For reference and for future users, I'm leaving my code here as a sample: