Automatic Scanned Document Data Extraction OCR NER in Python

Learn and Build Business Card Scanner App from Scratch with Python, Spacy, Pytesseract.

Welcome to Course “Automatic Scanned Document Data Extraction OCR NER in Python” !!!

What you’ll learn

  • Develop and Train Named Entity Recognition Model.
  • Not only Extract text from the Image but also Extract Entities from Business Card.
  • Develop Business Card Scanner like ABBY from Scratch.
  • High Level Data Preprocess Techniques for Natural Language Problem.
  • Real Time NER apps.

Course Content

  • Introduction –> 2 lectures • 3min.
  • Project Setup –> 6 lectures • 17min.
  • Data Preprocessing –> 10 lectures • 59min.
  • Training Named Entity Model (NER) –> 11 lectures • 47min.
  • Predictions –> 13 lectures • 1hr.

Automatic Scanned Document Data Extraction OCR NER in Python

Requirements

  • Should be at least beginner in Python.
  • Understand aggregation techniques with Pandas DataFrames.
  • Read, Write Images with OpenCV and Drawing Rectangles on Image.

Welcome to Course “Automatic Scanned Document Data Extraction OCR NER in Python” !!!

In this course you will learn how to develop customized Named Entity Recognizer. The main idea of this course is to extract entities from the scanned documents like invoice, Business Card, Shipping Bill, Bill of Lading documents etc. However, for the sake of data privacy we restricted our views to Business Card. But you can use the framework explained to all kinds of financial documents. Below given is the curriculum we are following to develop the project.

 

Section -0 : Setting Up Project

  1. Install Python
  2. Install Dependencies

Section -1 : Data Preprocessing

  1. Gather Images
  2. Overview on Pytesseract
  3. Extract Text from all Image
  4. Clean and Prepare text

Section – 2: Train Named Entity Recognition Model

  1. Prepare Training Data for Spacy
  2. Train Model
    1. Config
    2. Train
    3. Save

Section – 3: Prediction

1. Load Model

2. Render and Serve with Displacy

3. Draw Bounding Box on Image

 

Overview:

I will start the course by installing Python and installing the necessary libraries in Python for developing the end-to-end project. Then I will teach you one of the prerequisites of the course that is image processing techniques in OpenCV and the mathematical concepts behind the images. We will also do the necessary image analysis and required preprocessing steps for the images. Then we will do a mini project on Face Detection using OpenCV and Deep Neural Networks.

With the concepts of image basics, we will then start our project phase-1, face identity recognition. I will start this phase with preprocessing images, we will extract features from the images using deep neural networks. Then with the features of faces, we will train the different Deep learning models like Convolutional Neural Network.  I will teach you the model selection and hyperparameter tuning for face recognition models

Once our Deep learning model is ready, will we move to Section-3, and write the code for preforming predictions with CNN model.

Finally, we will develop the desktop application and make prediction to live video streaming.

What are you waiting for? Start the course develop your own Computer Vision Flask Desktop Application Project using Machine Learning, Python and Deploy it in Cloud with your own hands.

Get Tutorial