To solve that issue you have to Install tesseract OC. So the procedure to have this at lest starting is:
sudo apt update
sudo apt install tesseract-ocr
pip3 install pytesseract
and the download the data images for testing
cd ~/catkin_ws/src/
git clone https://github.com/madmaze/pytesseract.git
cd ~/catkin_ws/src/pytesseract/tests
cp data ~/catkin_ws/src/
And the execute the test:
try:
from PIL import Image
except ImportError:
import Image
import pytesseract
From my side I’m having this issue, maybe you can know how to fix it, because I tried to download the GitHub - tesseract-ocr/tessdata: Trained models with support for legacy and LSTM OCR engine and copying it into the ** /usr/local/share/tessdata
**, but didnt work
[‘eng’, ‘osd’] | |
---|---|
Traceback (most recent call last): | |
File “testeract_test.py”, line 19, in | |
print(pytesseract.image_to_string(Image.open(‘data/test-european.jpg’), lang=‘fra’)) | |
File “/home/user/.local/lib/python3.8/site-packages/pytesseract/pytesseract.py”, line 409, in image_to_string | |
return { | |
File “/home/user/.local/lib/python3.8/site-packages/pytesseract/pytesseract.py”, line 412, in | |
Output.STRING: lambda: run_and_get_output(*args), | |
File “/home/user/.local/lib/python3.8/site-packages/pytesseract/pytesseract.py”, line 287, in run_and_get_output | |
run_tesseract(**kwargs) | |
File “/home/user/.local/lib/python3.8/site-packages/pytesseract/pytesseract.py”, line 263, in run_tesseract | |
raise TesseractError(proc.returncode, get_errors(error_string)) | |
pytesseract.pytesseract.TesseractError: (1, ‘Error opening data file /usr/share/tesseract-ocr/4.00/tessdata/fra.traineddata Please make sure the TESSDATA_PREFIX environment variable is set to your “tessdata” directory. Failed loading language ‘fra’ Tesseract couldn’t load any languages! Could not initialize tesseract.’) |
Thatts all I got working. But at least it doesnt seem related to the system but not knowing how to make tesseract work properly
# If you don’t have tesseract executable in your PATH, include the following:
pytesseract.pytesseract.tesseract_cmd = r’/usr/bin/tesseract’
# Example tesseract_cmd = r’C:\Program Files (x86)\Tesseract-OCR\tesseract’
# Simple image to string
print(pytesseract.image_to_string(Image.open('data/test.png')))
# List of available languages
print(pytesseract.get_languages(config=''))
# French text image to string
print(pytesseract.image_to_string(Image.open('data/test-european.jpg'), lang='fra'))
# In order to bypass the image conversions of pytesseract, just use relative or absolute image path
# NOTE: In this case you should provide tesseract supported images or tesseract will return error
print(pytesseract.image_to_string('data/test.png'))
# Batch processing with a single file containing the list of multiple image file paths
print(pytesseract.image_to_string('data/images.txt'))
# Timeout/terminate the tesseract job after a period of time
try:
print(pytesseract.image_to_string('data/test.jpg', timeout=2)) # Timeout after 2 seconds
print(pytesseract.image_to_string('data/test.jpg', timeout=0.5)) # Timeout after half a second
except RuntimeError as timeout_error:
# Tesseract processing is terminated
pass
# Get bounding box estimates
print(pytesseract.image_to_boxes(Image.open('data/test.png')))
# Get verbose data including boxes, confidences, line and page numbers
print(pytesseract.image_to_data(Image.open('data/test.png')))
# Get information about orientation and script detection
print(pytesseract.image_to_osd(Image.open('data/test.png')))
# Get a searchable PDF
pdf = pytesseract.image_to_pdf_or_hocr('data/test.png', extension='pdf')
with open('test.pdf', 'w+b') as f:
f.write(pdf) # pdf type is bytes by default
# Get HOCR output
hocr = pytesseract.image_to_pdf_or_hocr('data/test.png', extension='hocr')
# Get ALTO XML output
xml = pytesseract.image_to_alto_xml('data/test.png')
Here is the Rosject: https://app.theconstructsim.com/#/l/3db0d674/