How To Extract Text From Images In Python

Introduction

In this example I will show you how to extract text from images in Python program. The text extraction from image could be used for various purpose, for example, data mining for machine learning projects, reading the content from images can be used for further processing in your applications.

To extract text from image I am going to use Python based library pytesseract. Python-tesseract is an optical character recognition (OCR) tool for python. That is, it will recognize and “read” the text embedded in images.

Python-tesseract is a wrapper for Google’s Tesseract-OCR Engine. It is also useful as a stand-alone invocation script to tesseract, as it can read all image types supported by the Pillow and Leptonica imaging libraries, including jpeg, png, gif, bmp, tiff, and others. Additionally, if used as a script, Python-tesseract will print the recognized text instead of writing it to a file.

Prerequisites

Python 3.9.5 – 3.9.7, Tesseract Installer

Download Tesseract and install in your system. In Windows system the exe file path would be like the C:\Program Files\Tesseract-OCR\tesseract.

Next install tesseract using the command pip install pytesseract.

Project Directory

Create a project root directory called python-extract-text-from-image as per your chosen location.

I may not mention the project’s root directory name in the subsequent sections, but I will assume that I am creating files with respect to the project’s root directory.

Python Script – Extract Text From Image

Now create a Python script file python-extract-text-from-image.py and write the following code into the script file.

import pytesseract

pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract'

print(pytesseract.image_to_string('1.png'))

I have imported the tesseract library in the Python script. Next, I have set the Tesseract library’s installed exe path to tesseract’s command.

Finally, I have used tesseract’s image_to_string() function to print the text of the image.

Image used in this example:

python extract text from image

Testing Text Extraction From Image

Execute the Python script and you will see the following output in the CLI interface.

extract text from image in python

Source Code

Download

Leave a Reply

Your email address will not be published. Required fields are marked *