How to Extract Text from Image with Python OCR: A Guide

text written how to extracting text from image with python ocr a step by step guide

In the article, we will learn how to extract text from image using Python OCR and about this fantastic technology. Python OCR comes with many libraries to extract the texts from different types of images and documents. In this python ocr tutorial, we will see a step-by-step guide to using them.

Let’s first understand the OCR technology in detail.

What is OCR Technology?

The OCR technology strand for Optical Character Recognition. This is a technology that enables the conversion of scanned or digital images: PDFs, and other documents into a searchable and editable text or text file. OCR software analyses the image of the text. It converts it into machine-readable text that can be used for various purposes, such as document digitization, data entry, text mining, and more.

History of OCR

The history of OCR (Optical Character Recognition) dates back to the mid-20th century. The first experimental OCR devices were developed in the 1920s and 1930s by Gustav Tauschek and Emanuel Goldberg. However, the technology was widely used in the 1950s and 1960s.

David Shepard developed one of the earliest successful OCR machines at IBM in the 1950s. The machine used a photoelectric cell to detect the presence or absence of ink on a page and converted it into digital signals. This technology was used to automate the processing of checks and other financial documents.

Advantages of OCR

Here are some advantages of OCR technology:

  • Increases efficiency and saves time compared to manual data entry
  • Improves accuracy and reduces errors
  • Reduces costs associated with manual data entry and physical document storage
  • It makes documents searchable and easy-to-locate information
  • Improves accessibility for people with visual impairments or disabilities

Disadvantages of OCR

Below are some points of ocr technology:

  • Potential errors in character recognition
  • Difficulty recognizing formattings such as tables or graphs
  • Limitations with specific languages or character sets
  • The initial investment required in hardware and software
  • Need for quality control measures to ensure accuracy

Real-World Examples of Python OCR

Python OCR (Optical Character Recognition) is used in many real-world applications. Here are some examples:

  • Digital document management
  • Invoice processing
  • Automated check processing
  • ID verification
  • Text recognition in images
  • Handwriting recognition

Now let’s learn how OCR technology works in real.

How does OCR Technology work?

In simple terms, OCR works by analyzing the patterns of light and dark pixels in an image and using this information to recognize characters and words. The process typically involves the following steps:

  1. Preprocessing: The image is first processed to remove any noise or distortion and to enhance the contrast between the text and the background.
  2. Segmentation: The image is then analyzed to identify the characters and words that make up the text.
  3. Recognition: Each character is compared to a database of known characters and matched with the closest match.
  4. Postprocessing: The recognized text is then processed to correct errors and improve accuracy.

OCR technology can be implemented in various ways using different algorithms and approaches. Some OCR systems use a rule-based approach, where the recognition process is guided by a set of rules describing each character’s characteristics. Other systems use machine learning algorithms, where the system learns from a large dataset of known characters and uses this information to recognize new characters.

As we have learned in and out of the OCR technology, let’s start coding and see how we can utilize Python programming to simplify our life by performing ocr with the python ocr library.

Here is the list of best ocr libraries in Python:

List of Best Python OCR Libraries

  1. Tesseract or pytesseract
  2. Easyocr
  3. Paddleocr
  4. Doctr ocr
  5. Keras ocr

Let’s learn them all step by step:

Python Tesseract Tutorial

This blog post will explore using Tesseract OCR in Python to extract text from images.

What is Tesseract OCR and How to Install It

Tesseract OCR is an open-source OCR engine developed by Google. It is widely used for OCR tasks because of its high accuracy and ability to recognize more than 100 languages. Installing Tesseract OCR in Python is a straightforward process. You can download and install it using pip, a package installer for Python.

To download Tesseract OCR in Python, you need to run the following command in your command prompt or terminal:

pip install pytesseract

But before running the Python code, we must install Tesseract on the respective operating system.

To install Tesseract on Windows, follow the below step.

  1. Download Tesseract exe from https://github.com/tesseract-ocr/tesseract.
  2. Install this exe in C:\Program Files (x86)\Tesseract-OCR

To install Tesseract on Linux, follow the below step.

  1. Open Terminal In the Linux OS.
  2. Run the below command to install Tesseract.

apt-get install tesseract-ocr

Once you have installed pytesseract and tesseract, you can start using them to extract text from images.

Extract Text from Images using Python Tesseract OCR

To extract text from an image using Tesseract OCR, you first need to import the Pytesseract module in your Python script. The module provides a function called image_to_string that takes an image file as input and returns the extracted text as output.

Here is an example of how to use the image_to_string function:

Example: how to use tesseract ocr in Python

import pytesseract
from PIL import Image

image = Image.open(‘image.jpg’)
text = pytesseract.image_to_string(image)
print(text)

In this example, we have extracted text from image using Python with tesseract by opening an image file called “image.jpg” using the PIL module, which is a Python Imaging Library. We then pass this image object to the “image_to_string” function of pytesseract, which returns the extracted text. Finally, we print the extracted text to the terminal.

Advanced Features of Tesseract OCR in Python

Tesseract OCR in Python provides several advanced features that can improve the accuracy and performance of OCR tasks. One such feature is the image_to_osd function, which extracts an image’s orientation and script detection information. This information can be used to correct the image’s exposure and improve OCR’s accuracy.

Here is an example of how to use the image_to_osd function:

import pytesseract
from PIL import Image

image = Image.open(‘image.jpg’)
osd = pytesseract.image_to_osd(image)
print(osd)

The output will be as follow:

Page number: 0
Orientation in degrees: 0
Rotate: 0
Orientation confidence: 1.05
Script: Cyrillic
Script confidence: 6.67

We use the image_to_osd function in this example to extract the orientation and script detection information. We then print this information to the console.

Tesseract OCR in Python also provides multiprocessing support, which can be used to parallelise OCR tasks and improve performance. To use multiprocessing with pytesseract, import the ThreadPool class from the multiprocessing module and create a pool of worker threads. You can then use the imap method of the pool object to apply the OCR function to a list of images in parallel.

Tessecrt Open source version is quite famous in the python community, but a few issues, like its runs on a CPU and we cannot run it on GPU, so it’s slow compared to other OCR engines. 

So many people are asking about how to extract text from image python without Tesseract on github and what is the alternative to tesseract ocr in python.

Python EasyOCR Tutorial

In this blog post, we will explore how to use EasyOCR in Python to extract text from images.

What is EasyOCR and How to Install It

EasyOCR is a Python library that uses deep learning and OCR technology to extract text from images. It is easy to use and can recognize text in more than 70 languages. Installing EasyOCR in Python is a straightforward process. You can download and install it using pip, a package installer for Python.

To download EasyOCR in Python, you need to run the following command in your command prompt or terminal:

pip install easyocr

Also, if you encounter issues like ”modulenotfounderror no module named ‘easyocr’”, you can follow the above steps to resolve it.

Once you have installed EasyOCR, you can start using it to extract text from images.

Extracting Text from Images using EasyOCR

To extract text from an image using EasyOCR, you first need to import the easyocr module in your Python script. The module provides a function called readtext that takes an image file as input and returns the extracted text as output.

Here is an example of how to use the readtext function:

import easyocr
from PIL import Image

image = Image.open(‘image.jpg’)
reader = easyocr.Reader([‘en’])
text = reader.readtext(image)
print(text)

In this example, we are opening an image file called “image.jpg” using the PIL module. We then pass this image object to the readtext function of EasyOCR, which returns the extracted text. Finally, we print the extracted text to the console.

Advanced Features of EasyOCR in Python

EasyOCR in Python provides several advanced features that can improve the accuracy and performance of OCR tasks. One such feature is the ability to use a GPU for faster processing.

To use a GPU with EasyOCR, you must install the CUDA toolkit and cuDNN library on your computer. You can then set the GPU parameter of the Reader class to True.

Here is an example of how to use a GPU with EasyOCR:

import easyocr
from PIL import Image

image = Image.open(‘image.jpg’)
reader = easyocr.Reader([‘en’], gpu=True)
text = reader.readtext(image)
print(text)

In this example, we are setting the GPU parameter of the Reader class to True, which tells EasyOCR to use a GPU for processing. This can significantly improve the speed of OCR tasks.

Easyocr also uses to perform handwriting ocr in python. Here is an example code for performing handwriting OCR using EasyOCR in Python:

import easyocr

# initialise the reader and specify the languages to be used
reader = easyocr.Reader([‘en’], gpu=False)

# load the image containing the handwriting text
img_path = ‘handwriting.png’
img = Image.open(img_path)

# perform OCR on the image and get the results
results = reader.readtext(img, detail=0)

# print the results
print(results)

In this example, we initialize an instance of the EasyOCR reader and specify the languages to use. Then, we load the handwriting text image using the Pillow library’s Image. Open function. Next, we call the readtext method of the reader instance, passing in the image and specifying that we want detailed results. Finally, we print the results.

EasyOCR in Python also provides support for the Hugging Face Transformers library, which can be used to improve the accuracy of OCR. The Hugging Face Transformers library provides state-of-the-art models for natural language processing tasks, including OCR. Also, you will get easyocr cuda not available type errors when you don’t have GPU in your system or GPU needs to be configured appropriately.

EasyOCR vs Tesseract Engines

ComparisonEasyOCRTesseract OCR
AgeNewer OCR engineMore than two decades old
InstallationEasy to install using pipRequires downloading and installing Tesseract OCR engine
Usagereadtext function to extract text from imagesimage_to_string function in pytesseract library to extract text from images
AccuracyAccurate text recognitionAccurate text recognition
PerformanceFaster, thanks to deep learning technology and GPU supportSlower than EasyOCR
Language SupportSupports more than 70 languagesSupports more than 100 languages but requires installation of language data files

Python Paddleocr Tutorial

In this blog post, we will introduce you to PaddleOCR, a popular OCR tool developed by Baidu Research, and show you how to use it with Python to extract text from images.

What is PaddleOCR?

PaddleOCR is a Python-based OCR toolkit developed by Baidu Research. It is an open-source project that provides state-of-the-art OCR models and algorithms for detecting and recognizing image text. The toolkit is built on PaddlePaddle, a deep learning platform developed by Baidu, which offers easy-to-use APIs for training and deploying OCR models. PaddleOCR supports various OCR tasks, including text detection, recognition, and end-to-end OCR.

One of the critical advantages of PaddleOCR is its ease of use. The toolkit provides pre-trained models for various OCR tasks, which can be easily downloaded and used in your projects. PaddleOCR also supports GPU acceleration, which significantly speeds up the OCR process.

How to use PaddleOCR with Python

Now that you have a basic understanding of PaddleOCR and how it compares to other OCR tools, let’s see how you can use Python to extract text from images.

The first step is to install PaddleOCR. You can install PaddleOCR using pip by running the following command:

pip install paddleocr

Once you have installed PaddleOCR, you can load a pre-trained OCR model and use it to extract text from an image. Here’s an example:

import paddleocr

# Load the text detector and recognizer
ocr = paddleocr.OCR()

# Read the image and extract text
result = ocr.ocr(‘image.jpg’)

# Print the extracted text
for line in result:
    print(line[1][0])

In this example, we first load the OCR model using the OCR() function provided by PaddleOCR. We then pass an image file to the ocr() function to extract text from the image. Finally, we print the extracted text.

Paddleocr vs Easyocr

Here we will see a comparison of Easyocr vs Paddleocr

ComparisonEasyOCRPaddleOCR
Open-sourceYesYes
Language supportMultipleMultiple
Ease of useEasyEasy
AccuracyHighHigh
PerformanceSlowerFaster
Ease of installationEasierRequires additional steps
GPU supportLimitedBetter
Model sizeLargerSmaller

Python Doctr OCR Tutorial

In this blog post, we will introduce you to Doctr OCR, a widespread OCR tool used with Python to extract text from images.

What is Doctr?

Doctr is an open-source Python package for OCR developed by Mindee, a company specialising in document analysis. Doctr is designed to make OCR accessible to developers and data scientists by providing a simple API for processing various document types. Doctr is built on top of Tesseract OCR, an open-source OCR engine, and adds features such as automatic layout analysis and image preprocessing.

Installing Doctr

To get started with Doctr, you first need to install it. You can install Doctr using pip, the Python package installer. Simply open your terminal or command prompt and enter the following command:

pip install python-doctr

How to Use Doctr for OCR Document Processing

from doctr.io import DocumentFile
from doctr.models import ocr_predictor

model = ocr_predictor(pretrained=True)
# PDF
doc = DocumentFile.from_pdf(“path/to/your/doc.pdf”)
# Analyze
result = model(doc)print(result)

In this example, we first create a DocumentFile object from a PDF file, then call the ocr_predictor() method to extract the text from the document. The extracted text is then printed to the console.

OCR for Scanned Documents

Doctr is particularly useful for OCR processing of scanned documents. Scanned documents are essentially images of text, so OCR processing is necessary to extract the text and convert it into a machine-readable format.

Doctr uses advanced machine learning algorithms to recognise text even in complex layouts and low-quality scans. It can also handle documents with multiple languages and scripts, making it a versatile tool for international organisations.

Paddleocr vs Doctr ocr

FeaturePaddleOCRDoctr OCR
Open-sourceYesYes
Language Support65+ languagesEnglish, French
Pre-trained modelsYesYes
GPU SupportYesYes
AccuracyHighHigh
OCR for scanned documentsYesYes
OCR for handwritten textYesNo
APIYesYes
CustomizabilityLimitedHigh
PerformanceFastFast

Python Keras-OCR Tutorial

In this blog post, we’ll explore Keras-OCR, its features, and how to use it to extract text from images using Python.

What is Keras-OCR?

Keras-OCR is a Python library that leverages Keras and TensorFlow to perform image OCR. It provides an end-to-end pipeline for OCR, including image pre-processing, text detection, and text recognition. Keras-OCR is user-friendly and easy to use, making it a popular choice for OCR tasks.

How to Install Keras-OCR

To install Keras-OCR, you need to have Python 3.6 or higher installed. Then, you can install Keras-OCR using pip, a popular Python package manager. Here’s how to do it:

pip install keras-ocr

Using Keras-OCR

Keras-OCR provides a simple API for performing OCR on images. The first step is to create an instance of the keras_ocr.pipeline.Pipeline class, which is the core component of the library. Here’s an example of how to do it:

import keras_ocr

pipeline = keras_ocr.pipeline.Pipeline()

The next step is to load an image and pass it through the pipeline. Here’s an example:

import matplotlib.pyplot as plt
import keras_ocr

# Load the image
image = keras_ocr.tools.read(url=’https://upload.wikimedia.org/wikipedia/commons/1/13/OCR_sample2.png’)

# Perform OCR
predictions = pipeline.recognize([image])

# Show the image and OCR output
fig, axs = plt.subplots(nrows=len(predictions), figsize=(20, 20))
for ax, prediction in zip(axs, predictions):
    keras_ocr.tools.drawAnnotations(image=image, predictions=prediction, ax=ax)
plt.show()

This code will load the image, pass it through the pipeline, and display it with the OCR output overlaid. Note that the read function loads the image and accepts various inputs, such as a local file path or a URL.

Docrtr OCR vs Keras-OCR

FeaturePaddleOCRDoctr OCR
Open-sourceYesYes
Language Support65+ languagesEnglish, French
Pre-trained modelsYesYes
GPU SupportYesYes
AccuracyHighHigh
OCR for scanned documentsYesYes
OCR for handwritten textYesNo
APIYesYes
CustomizabilityLimitedHigh
PerformanceFastFast

FAQs

Q: How to do python ocr without Tesseract?

A: While Tesseract is a popular OCR engine for Python, other OCR libraries and tools are available to perform OCR tasks without Tesseract. Here are some alternatives: Paddleocr, Easyocr, and Doctr OCR.

Q: How to extract text from an image Python opencv?

A: You cannot extract text from an image using OpenCV because opencv is an image processing library, not an OCR engine, but you can detect text in the image using opencv.

Q: Can I use OCR for handwriting recognition in Python?

A: Yes, OCR can be used for handwriting recognition in Python. However, the accuracy of OCR in recognizing handwriting can vary depending on the quality of the handwriting, the language used, and the OCR library used. PaddleOCR and Tesseract OCR are known to have good accuracy in recognizing handwriting.

Q: Can I use OCR to extract text from PDFs?

A: OCR can extract text from PDFs containing scanned images or non-searchable text. OCR software can convert the images or non-searchable text in a PDF into searchable text, making it easier to read and edit. Tesseract OCR, DocTR OCR, and PaddleOCR are some of the OCR libraries that support PDF input.

Q: Is Keras-OCR suitable for real-time text recognition?

A: While Keras-OCR can recognize text in real time, there may be more efficient solutions for high-speed, real-time text recognition tasks due to its reliance on deep learning models.