Hi,
Do you in anyway support pytesseract at the moment? I am able to install pytessetact through pip but when I try to import it it reports:
Traceback (most recent call last): File "test.py", line 2, in <module> import pytesseract File "/home/user/.local/lib/python2.7/site-packages/pytesseract/__init__.py", line 2, in <module> from .pytesseract import ALTONotSupported File "/home/user/.local/lib/python2.7/site-packages/pytesseract/pytesseract.py", line 89 f"{tesseract_cmd} is not installed or it's not in your PATH."
^
I tried to look up online and it appears this problem usually apprears when imported pytesseract and trying to use one of its functions. Although the case is different (here the error occur when importing pytesseract), I tried their solution by adding the following line to the python file:
pytesseract.pytesseract.tesseract_cmd = '/home/user/.local/lib/python2.7/site-packages/tesseract'
But it didn’t work and the error persists
Hello @ZisenZhou ,
Are you working on your own rosject? If this is the case, maybe you could share here your rosject and instructions to reproduce this error so that we (or other users) can help you better.
Best,
To solve that issue you have to Install tesseract OC. So the procedure to have this at lest starting is:
sudo apt update
sudo apt install tesseract-ocr
pip3 install pytesseract
and the download the data images for testing
cd ~/catkin_ws/src/
git clone https://github.com/madmaze/pytesseract.git
cd ~/catkin_ws/src/pytesseract/tests
cp data ~/catkin_ws/src/
And the execute the test:
try:
from PIL import Image
except ImportError:
import Image
import pytesseract
From my side I’m having this issue, maybe you can know how to fix it, because I tried to download the GitHub - tesseract-ocr/tessdata: Trained models with support for legacy and LSTM OCR engine and copying it into the ** /usr/local/share/tessdata
**, but didnt work
[‘eng’, ‘osd’] | |
---|---|
Traceback (most recent call last): | |
File “testeract_test.py”, line 19, in | |
print(pytesseract.image_to_string(Image.open(‘data/test-european.jpg’), lang=‘fra’)) | |
File “/home/user/.local/lib/python3.8/site-packages/pytesseract/pytesseract.py”, line 409, in image_to_string | |
return { | |
File “/home/user/.local/lib/python3.8/site-packages/pytesseract/pytesseract.py”, line 412, in | |
Output.STRING: lambda: run_and_get_output(*args), | |
File “/home/user/.local/lib/python3.8/site-packages/pytesseract/pytesseract.py”, line 287, in run_and_get_output | |
run_tesseract(**kwargs) | |
File “/home/user/.local/lib/python3.8/site-packages/pytesseract/pytesseract.py”, line 263, in run_tesseract | |
raise TesseractError(proc.returncode, get_errors(error_string)) | |
pytesseract.pytesseract.TesseractError: (1, ‘Error opening data file /usr/share/tesseract-ocr/4.00/tessdata/fra.traineddata Please make sure the TESSDATA_PREFIX environment variable is set to your “tessdata” directory. Failed loading language ‘fra’ Tesseract couldn’t load any languages! Could not initialize tesseract.’) |
Thatts all I got working. But at least it doesnt seem related to the system but not knowing how to make tesseract work properly
# If you don’t have tesseract executable in your PATH, include the following:
pytesseract.pytesseract.tesseract_cmd = r’/usr/bin/tesseract’
# Example tesseract_cmd = r’C:\Program Files (x86)\Tesseract-OCR\tesseract’
# Simple image to string
print(pytesseract.image_to_string(Image.open('data/test.png')))
# List of available languages
print(pytesseract.get_languages(config=''))
# French text image to string
print(pytesseract.image_to_string(Image.open('data/test-european.jpg'), lang='fra'))
# In order to bypass the image conversions of pytesseract, just use relative or absolute image path
# NOTE: In this case you should provide tesseract supported images or tesseract will return error
print(pytesseract.image_to_string('data/test.png'))
# Batch processing with a single file containing the list of multiple image file paths
print(pytesseract.image_to_string('data/images.txt'))
# Timeout/terminate the tesseract job after a period of time
try:
print(pytesseract.image_to_string('data/test.jpg', timeout=2)) # Timeout after 2 seconds
print(pytesseract.image_to_string('data/test.jpg', timeout=0.5)) # Timeout after half a second
except RuntimeError as timeout_error:
# Tesseract processing is terminated
pass
# Get bounding box estimates
print(pytesseract.image_to_boxes(Image.open('data/test.png')))
# Get verbose data including boxes, confidences, line and page numbers
print(pytesseract.image_to_data(Image.open('data/test.png')))
# Get information about orientation and script detection
print(pytesseract.image_to_osd(Image.open('data/test.png')))
# Get a searchable PDF
pdf = pytesseract.image_to_pdf_or_hocr('data/test.png', extension='pdf')
with open('test.pdf', 'w+b') as f:
f.write(pdf) # pdf type is bytes by default
# Get HOCR output
hocr = pytesseract.image_to_pdf_or_hocr('data/test.png', extension='hocr')
# Get ALTO XML output
xml = pytesseract.image_to_alto_xml('data/test.png')
Here is the Rosject: https://app.theconstructsim.com/#/l/3db0d674/
Hi, I’ve tried what you suggested but still got the same error “{tesseract_cmd} is not installed or it’s not in your PATH.” I was wondering could it be the problem we’re using python2.7?
Thank you for your help.
Hi,
I tested this in a Noetic Ubuntu 20 Rosject.
I didnt try in an Ubuntu 16 or 18.