Document Scanner

NOTE: Please reload the page when you jump from one lecture to another so that the code highlighter is loaded! Sorry for the inconvenience.

In this part, we will create a simple document scanner. This can be useful, for example, for scanning pages in a book.

The steps that we need to follow to build this project are:

Convert the image to grayscale
Find the edges in the image
Use the edges to find all the contours
Select only the contours of the document
Apply warp perspective to get the top-down view of the document

Load the Image

Let's get started. Create a new file inside the document-scanner directory, name it scanner.py and put the following code:

from imutils.perspective import four_point_transform
import cv2

height = 800
width = 600
green = (0, 255, 0)

image = cv2.imread("input/2.jpg")
image = cv2.resize(image, (width, height))
orig_image = image.copy()

We start by importing the OpenCV library and the four_point_transform helper function from the imutils package.

This function will help us perform a 4 point perspective transform to obtain the top-down view of the document.

Then we set the height and width of the image so that we can resize it. We also create the green variable for the contour display later on.

Next, we load our image, resize it, and take a copy of it. This will allow us later to display the contours of the document on the original image rather than the modified image.

Image Processing

Now we can start preprocessing our image by converting it to grayscale, blurring it, and then finding the edges in the image. Let's see how to do it:

gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) # convert the image to gray scale
blur = cv2.GaussianBlur(gray, (5, 5), 0) # Add Gaussian blur
edged = cv2.Canny(blur, 75, 200) # Apply the Canny algorithm to find the edges

# Show the image and the edges
cv2.imshow('Original image:', image)
cv2.imshow('Edged:', edged)
cv2.waitKey(0)
cv2.destroyAllWindows()

Now that our image is loaded we start by converting it from the RGB color to grayscale.

Next, to remove noise from the image, we smooth it by using the cv2.GaussianBlur function.

Then we apply the Canny edge detector. This is a multi-stage algorithm that is used to remove noise and detect edges in the image.

Finally, we display our images in a window and wait for a key to close the window.

Below you can see the output that we get:

image and edged image

Use the Edges to Find all the Contours

Now we can use our edged image to find all the contours.

# If you are using OpenCV v3, v4-pre, or v4-alpha
# cv2.findContours returns a tuple with 3 element instead of 2
# where the `contours` is the second one
# In the version OpenCV v2.4, v4-beta, and v4-official
# the function returns a tuple with 2 element 
contours, _ = cv2.findContours(edged, cv2.RETR_LIST, cv2.CHAIN_APPROX_SIMPLE)
contours = sorted(contours, key=cv2.contourArea, reverse=True)

# Show the image and all the contours
cv2.imshow("Image", image)
cv2.drawContours(image, contours, -1, green, 3)
cv2.imshow("All contours", image)
cv2.waitKey(0)
cv2.destroyAllWindows()

To find the contours on the image we apply the cv2.findContours function.

The cv2.drawContours function allows us to draw contours on the image.

Let's see what we get so far:

contours

Cool! Let's keep going.

Select Only the Edges of the Document

Now we need to find the biggest contour in the image that will define our document. Here is how to do it:

# go through each contour
for contour in contours:
    # we approximate the contour
    peri = cv2.arcLength(contour, True)
    approx = cv2.approxPolyDP(contour, 0.05 * peri, True)
    # if we found a contour with 4 points we break the for loop
    # (we can assume that we have found our document)
    if len(approx) == 4:
        doc_cnts = approx
        break

Here we use the cv2.arcLength function to compute the perimeter of the contour. It takes as first argument the contour, and the second argument is just a boolean to tell the function whether the contour is closed or not. True means that the contour is closed.

Then we use the cv2.approxPolyDP function to get the approximation of the contour with another contour with fewer vertices.

This function takes 3 arguments: the first one is the contour we want to approximate, the second argument is to specify the approximation accuracy. In our case, we are approximating the contour with an accuracy that is proportional to the contour perimeter (0.05 * perimeter).

The last argument is a boolean to specify whether the approximated contour is closed or not.

Finally, we check if the approximated contour has four points. If so, we can assume with confidence that we have found our document (we break the for loop).

You can start to see the limits of our algorithm ...

For example, If we use a document that is not a rectangle, our technique wouldn't work.

Apply Warp Perspective to Get the Top-Down View of the Document

Now we are ready to apply the four_point_transform function to get the top-down view:

# We draw the contours on the original image not the modified one
cv2.drawContours(orig_image, [doc_cnts], -1, green, 3)
cv2.imshow("Contours of the document", orig_image)
# apply warp perspective to get the top-down view
warped = four_point_transform(orig_image, doc_cnts.reshape(4, 2))
# convert the warped image to grayscale
warped = cv2.cvtColor(warped, cv2.COLOR_BGR2GRAY)
cv2.imshow("Scanned", cv2.resize(warped, (600, 800)))
cv2.waitKey(0)
cv2.destroyAllWindows()

Basically, the four_point_transform function takes an image and a contour as input and returns the top-down view of the image.

Here is what we get:

top down view

Bonus

This simple program will loop over a directory (named input), find images in that directory, apply the technique we saw in this tutorial to get the top-down view of each document, and put it in a new directory (named output).

from pathlib import Path
import os

# ...

valid_formats = [".jpg", ".jpeg", ".png"]
get_text = lambda f: os.path.splitext(f)[1].lower()

img_files = ['input/' + f for f in os.listdir('input') if get_text(f) in valid_formats]
# create a new folder that will contain our images
Path("output").mkdir(exist_ok=True)

# go through each image file
for img_file in img_files:
    # read, resize, and make a copy of the image
    img = cv2.imread(img_file)
    img = cv2.resize(img, (width, height))
    orig_img = img.copy()

    # preprocess the image
    gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    blur = cv2.GaussianBlur(gray, (5, 5), 0)
    edged = cv2.Canny(img, 75, 200)

    # find and sort the contours
    contours, _ = cv2.findContours(edged, cv2.RETR_LIST, cv2.CHAIN_APPROX_SIMPLE)
    contours = sorted(contours, key=cv2.contourArea, reverse=True)
    # go through each contour
    for contour in contours:
        # approximate each contour
        peri = cv2.arcLength(contour, True)
        approx = cv2.approxPolyDP(contour, 0.05 * peri, True)
        # check if we have found our document
        if len(approx) == 4:
            doc_cnts = approx
            break
   
    # apply warp perspective to get the top-down view
    warped = four_point_transform(orig_img, doc_cnts.reshape(4, 2))
    warped = cv2.cvtColor(warped, cv2.COLOR_BGR2GRAY)
    final_img = cv2.resize(warped, (600, 800))
    
    # write the image in the output directory
    cv2.imwrite("output" + "/" + os.path.basename(img_file), final_img)

Summary

In this tutorial, you learned how to build a simple document scanner with OpenCV. Of course, the algorithm has its limitations but I tried to make this tutorial as simple as possible so that you don't feel overwhelmed.

For example, you can see that the quality of the scanned image is a bit poor. That's because we lose too much information when we resize the image.

Indeed, try to keep the original size of the image and you will get a better result.

Also, you can apply adaptive thresholding at the final step to get a 'black and white' scanned image.

Complete and Continue

Discussion

Computer Vision with OpenCV and Python