Can't import pytesseract

duckfrost2 · February 25, 2021, 9:35am

To solve that issue you have to Install tesseract OC. So the procedure to have this at lest starting is:

sudo apt update
sudo apt install tesseract-ocr
pip3 install pytesseract

and the download the data images for testing

cd ~/catkin_ws/src/
git clone https://github.com/madmaze/pytesseract.git
cd ~/catkin_ws/src/pytesseract/tests
cp data ~/catkin_ws/src/

And the execute the test:

try:
    from PIL import Image
except ImportError:
    import Image

import pytesseract

From my side I’m having this issue, maybe you can know how to fix it, because I tried to download the GitHub - tesseract-ocr/tessdata: Trained models with support for legacy and LSTM OCR engine and copying it into the ** /usr/local/share/tessdata**, but didnt work

	[‘eng’, ‘osd’]
	Traceback (most recent call last):
	File “testeract_test.py”, line 19, in
	print(pytesseract.image_to_string(Image.open(‘data/test-european.jpg’), lang=‘fra’))
	File “/home/user/.local/lib/python3.8/site-packages/pytesseract/pytesseract.py”, line 409, in image_to_string
	return {
	File “/home/user/.local/lib/python3.8/site-packages/pytesseract/pytesseract.py”, line 412, in
	Output.STRING: lambda: run_and_get_output(*args),
	File “/home/user/.local/lib/python3.8/site-packages/pytesseract/pytesseract.py”, line 287, in run_and_get_output
	run_tesseract(**kwargs)
	File “/home/user/.local/lib/python3.8/site-packages/pytesseract/pytesseract.py”, line 263, in run_tesseract
	raise TesseractError(proc.returncode, get_errors(error_string))
	pytesseract.pytesseract.TesseractError: (1, ‘Error opening data file /usr/share/tesseract-ocr/4.00/tessdata/fra.traineddata Please make sure the TESSDATA_PREFIX environment variable is set to your “tessdata” directory. Failed loading language ‘fra’ Tesseract couldn’t load any languages! Could not initialize tesseract.’)

Thatts all I got working. But at least it doesnt seem related to the system but not knowing how to make tesseract work properly
# If you don’t have tesseract executable in your PATH, include the following:
pytesseract.pytesseract.tesseract_cmd = r’/usr/bin/tesseract’
# Example tesseract_cmd = r’C:\Program Files (x86)\Tesseract-OCR\tesseract’

# Simple image to string
print(pytesseract.image_to_string(Image.open('data/test.png')))

# List of available languages
print(pytesseract.get_languages(config=''))

# French text image to string
print(pytesseract.image_to_string(Image.open('data/test-european.jpg'), lang='fra'))

# In order to bypass the image conversions of pytesseract, just use relative or absolute image path
# NOTE: In this case you should provide tesseract supported images or tesseract will return error
print(pytesseract.image_to_string('data/test.png'))

# Batch processing with a single file containing the list of multiple image file paths
print(pytesseract.image_to_string('data/images.txt'))

# Timeout/terminate the tesseract job after a period of time
try:
	print(pytesseract.image_to_string('data/test.jpg', timeout=2)) # Timeout after 2 seconds
	print(pytesseract.image_to_string('data/test.jpg', timeout=0.5)) # Timeout after half a second
except RuntimeError as timeout_error:
	# Tesseract processing is terminated
	pass

	# Get bounding box estimates
	print(pytesseract.image_to_boxes(Image.open('data/test.png')))

	# Get verbose data including boxes, confidences, line and page numbers
	print(pytesseract.image_to_data(Image.open('data/test.png')))

	# Get information about orientation and script detection
	print(pytesseract.image_to_osd(Image.open('data/test.png')))

	# Get a searchable PDF
	pdf = pytesseract.image_to_pdf_or_hocr('data/test.png', extension='pdf')
	with open('test.pdf', 'w+b') as f:
	    f.write(pdf) # pdf type is bytes by default

	# Get HOCR output
	hocr = pytesseract.image_to_pdf_or_hocr('data/test.png', extension='hocr')

	# Get ALTO XML output
	xml = pytesseract.image_to_alto_xml('data/test.png')

Here is the Rosject: https://app.theconstructsim.com/#/l/3db0d674/