Can't import pytesseract

ZisenZhou · February 25, 2021, 1:10am

Hi,
Do you in anyway support pytesseract at the moment? I am able to install pytessetact through pip but when I try to import it it reports:
Traceback (most recent call last): File "test.py", line 2, in <module> import pytesseract File "/home/user/.local/lib/python2.7/site-packages/pytesseract/__init__.py", line 2, in <module> from .pytesseract import ALTONotSupported File "/home/user/.local/lib/python2.7/site-packages/pytesseract/pytesseract.py", line 89 f"{tesseract_cmd} is not installed or it's not in your PATH."
^
I tried to look up online and it appears this problem usually apprears when imported pytesseract and trying to use one of its functions. Although the case is different (here the error occur when importing pytesseract), I tried their solution by adding the following line to the python file:
pytesseract.pytesseract.tesseract_cmd = '/home/user/.local/lib/python2.7/site-packages/tesseract'
But it didn’t work and the error persists

albertoezquerro · February 25, 2021, 8:58am

Hello @ZisenZhou ,

Are you working on your own rosject? If this is the case, maybe you could share here your rosject and instructions to reproduce this error so that we (or other users) can help you better.

Best,

duckfrost2 · February 25, 2021, 9:35am

To solve that issue you have to Install tesseract OC. So the procedure to have this at lest starting is:

sudo apt update
sudo apt install tesseract-ocr
pip3 install pytesseract

and the download the data images for testing

cd ~/catkin_ws/src/
git clone https://github.com/madmaze/pytesseract.git
cd ~/catkin_ws/src/pytesseract/tests
cp data ~/catkin_ws/src/

And the execute the test:

try:
    from PIL import Image
except ImportError:
    import Image

import pytesseract

From my side I’m having this issue, maybe you can know how to fix it, because I tried to download the GitHub - tesseract-ocr/tessdata: Trained models with support for legacy and LSTM OCR engine and copying it into the ** /usr/local/share/tessdata**, but didnt work

	[‘eng’, ‘osd’]
	Traceback (most recent call last):
	File “testeract_test.py”, line 19, in
	print(pytesseract.image_to_string(Image.open(‘data/test-european.jpg’), lang=‘fra’))
	File “/home/user/.local/lib/python3.8/site-packages/pytesseract/pytesseract.py”, line 409, in image_to_string
	return {
	File “/home/user/.local/lib/python3.8/site-packages/pytesseract/pytesseract.py”, line 412, in
	Output.STRING: lambda: run_and_get_output(*args),
	File “/home/user/.local/lib/python3.8/site-packages/pytesseract/pytesseract.py”, line 287, in run_and_get_output
	run_tesseract(**kwargs)
	File “/home/user/.local/lib/python3.8/site-packages/pytesseract/pytesseract.py”, line 263, in run_tesseract
	raise TesseractError(proc.returncode, get_errors(error_string))
	pytesseract.pytesseract.TesseractError: (1, ‘Error opening data file /usr/share/tesseract-ocr/4.00/tessdata/fra.traineddata Please make sure the TESSDATA_PREFIX environment variable is set to your “tessdata” directory. Failed loading language ‘fra’ Tesseract couldn’t load any languages! Could not initialize tesseract.’)

Thatts all I got working. But at least it doesnt seem related to the system but not knowing how to make tesseract work properly
# If you don’t have tesseract executable in your PATH, include the following:
pytesseract.pytesseract.tesseract_cmd = r’/usr/bin/tesseract’
# Example tesseract_cmd = r’C:\Program Files (x86)\Tesseract-OCR\tesseract’

# Simple image to string
print(pytesseract.image_to_string(Image.open('data/test.png')))

# List of available languages
print(pytesseract.get_languages(config=''))

# French text image to string
print(pytesseract.image_to_string(Image.open('data/test-european.jpg'), lang='fra'))

# In order to bypass the image conversions of pytesseract, just use relative or absolute image path
# NOTE: In this case you should provide tesseract supported images or tesseract will return error
print(pytesseract.image_to_string('data/test.png'))

# Batch processing with a single file containing the list of multiple image file paths
print(pytesseract.image_to_string('data/images.txt'))

# Timeout/terminate the tesseract job after a period of time
try:
	print(pytesseract.image_to_string('data/test.jpg', timeout=2)) # Timeout after 2 seconds
	print(pytesseract.image_to_string('data/test.jpg', timeout=0.5)) # Timeout after half a second
except RuntimeError as timeout_error:
	# Tesseract processing is terminated
	pass

	# Get bounding box estimates
	print(pytesseract.image_to_boxes(Image.open('data/test.png')))

	# Get verbose data including boxes, confidences, line and page numbers
	print(pytesseract.image_to_data(Image.open('data/test.png')))

	# Get information about orientation and script detection
	print(pytesseract.image_to_osd(Image.open('data/test.png')))

	# Get a searchable PDF
	pdf = pytesseract.image_to_pdf_or_hocr('data/test.png', extension='pdf')
	with open('test.pdf', 'w+b') as f:
	    f.write(pdf) # pdf type is bytes by default

	# Get HOCR output
	hocr = pytesseract.image_to_pdf_or_hocr('data/test.png', extension='hocr')

	# Get ALTO XML output
	xml = pytesseract.image_to_alto_xml('data/test.png')

Here is the Rosject: https://app.theconstructsim.com/#/l/3db0d674/

ClubSand · February 25, 2021, 9:09pm

Hi, I’ve tried what you suggested but still got the same error “{tesseract_cmd} is not installed or it’s not in your PATH.” I was wondering could it be the problem we’re using python2.7?

Thank you for your help.

duckfrost2 · February 26, 2021, 8:52am

Hi,

I tested this in a Noetic Ubuntu 20 Rosject.
I didnt try in an Ubuntu 16 or 18.