NOTE: Please reload the page when you jump from one lecture to another so that the code highlighter is loaded! Sorry for the inconvenience.
In this part, we will create a simple document scanner. This can be useful, for example, for scanning pages in a book.
The steps that we need to follow to build this project are:
- Convert the image to grayscale
- Find the edges in the image
- Use the edges to find all the contours
- Select only the contours of the document
- Apply warp perspective to get the top-down view of the document
Let's get started. Create a new file inside the document-scanner directory, name it scanner.py and put the following code:
from imutils.perspective import four_point_transform import cv2 height = 800 width = 600 green = (0, 255, 0) image = cv2.imread("input/2.jpg") image = cv2.resize(image, (width, height)) orig_image = image.copy()
We start by importing the OpenCV library and the four_point_transform helper function from the imutils package.
This function will help us perform a 4 point perspective transform to obtain the top-down view of the document.
Then we set the height and width of the image so that we can resize it. We also create the green variable for the contour display later on.
Next, we load our image, resize it, and take a copy of it. This will allow us later to display the contours of the document on the original image rather than the modified image.
Now we can start preprocessing our image by converting it to grayscale, blurring it, and then finding the edges in the image. Let's see how to do it:
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) # convert the image to gray scale blur = cv2.GaussianBlur(gray, (5, 5), 0) # Add Gaussian blur edged = cv2.Canny(blur, 75, 200) # Apply the Canny algorithm to find the edges # Show the image and the edges cv2.imshow('Original image:', image) cv2.imshow('Edged:', edged) cv2.waitKey(0) cv2.destroyAllWindows()
Now that our image is loaded we start by converting it from the RGB color to grayscale.
Next, to remove noise from the image, we smooth it by using the cv2.GaussianBlur function.
Then we apply the Canny edge detector. This is a multi-stage algorithm that is used to remove noise and detect edges in the image.
Finally, we display our images in a window and wait for a key to close the window.
Below you can see the output that we get:
Now we can use our edged image to find all the contours.
# If you are using OpenCV v3, v4-pre, or v4-alpha # cv2.findContours returns a tuple with 3 element instead of 2 # where the `contours` is the second one # In the version OpenCV v2.4, v4-beta, and v4-official # the function returns a tuple with 2 element contours, _ = cv2.findContours(edged, cv2.RETR_LIST, cv2.CHAIN_APPROX_SIMPLE) contours = sorted(contours, key=cv2.contourArea, reverse=True) # Show the image and all the contours cv2.imshow("Image", image) cv2.drawContours(image, contours, -1, green, 3) cv2.imshow("All contours", image) cv2.waitKey(0) cv2.destroyAllWindows()
To find the contours on the image we apply the cv2.findContours function.
The cv2.drawContours function allows us to draw contours on the image.
Let's see what we get so far:
Cool! Let's keep going.
Now we need to find the biggest contour in the image that will define our document. Here is how to do it:
# go through each contour for contour in contours: # we approximate the contour peri = cv2.arcLength(contour, True) approx = cv2.approxPolyDP(contour, 0.05 * peri, True) # if we found a contour with 4 points we break the for loop # (we can assume that we have found our document) if len(approx) == 4: doc_cnts = approx break
Here we use the cv2.arcLength function to compute the perimeter of the contour. It takes as first argument the contour, and the second argument is just a boolean to tell the function whether the contour is closed or not. True means that the contour is closed.
Then we use the cv2.approxPolyDP function to get the approximation of the contour with another contour with fewer vertices.
This function takes 3 arguments: the first one is the contour we want to approximate, the second argument is to specify the approximation accuracy. In our case, we are approximating the contour with an accuracy that is proportional to the contour perimeter (0.05 * perimeter).
The last argument is a boolean to specify whether the approximated contour is closed or not.
Finally, we check if the approximated contour has four points. If so, we can assume with confidence that we have found our document (we break the for loop).
You can start to see the limits of our algorithm ...
For example, If we use a document that is not a rectangle, our technique wouldn't work.
Now we are ready to apply the four_point_transform function to get the top-down view:
# We draw the contours on the original image not the modified one cv2.drawContours(orig_image, [doc_cnts], -1, green, 3) cv2.imshow("Contours of the document", orig_image) # apply warp perspective to get the top-down view warped = four_point_transform(orig_image, doc_cnts.reshape(4, 2)) # convert the warped image to grayscale warped = cv2.cvtColor(warped, cv2.COLOR_BGR2GRAY) cv2.imshow("Scanned", cv2.resize(warped, (600, 800))) cv2.waitKey(0) cv2.destroyAllWindows()
Basically, the four_point_transform function takes an image and a contour as input and returns the top-down view of the image.
Here is what we get:
This simple program will loop over a directory (named input), find images in that directory, apply the technique we saw in this tutorial to get the top-down view of each document, and put it in a new directory (named output).
from pathlib import Path import os # ... valid_formats = [".jpg", ".jpeg", ".png"] get_text = lambda f: os.path.splitext(f).lower() img_files = ['input/' + f for f in os.listdir('input') if get_text(f) in valid_formats] # create a new folder that will contain our images Path("output").mkdir(exist_ok=True) # go through each image file for img_file in img_files: # read, resize, and make a copy of the image img = cv2.imread(img_file) img = cv2.resize(img, (width, height)) orig_img = img.copy() # preprocess the image gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) blur = cv2.GaussianBlur(gray, (5, 5), 0) edged = cv2.Canny(img, 75, 200) # find and sort the contours contours, _ = cv2.findContours(edged, cv2.RETR_LIST, cv2.CHAIN_APPROX_SIMPLE) contours = sorted(contours, key=cv2.contourArea, reverse=True) # go through each contour for contour in contours: # approximate each contour peri = cv2.arcLength(contour, True) approx = cv2.approxPolyDP(contour, 0.05 * peri, True) # check if we have found our document if len(approx) == 4: doc_cnts = approx break # apply warp perspective to get the top-down view warped = four_point_transform(orig_img, doc_cnts.reshape(4, 2)) warped = cv2.cvtColor(warped, cv2.COLOR_BGR2GRAY) final_img = cv2.resize(warped, (600, 800)) # write the image in the output directory cv2.imwrite("output" + "/" + os.path.basename(img_file), final_img)
In this tutorial, you learned how to build a simple document scanner with OpenCV. Of course, the algorithm has its limitations but I tried to make this tutorial as simple as possible so that you don't feel overwhelmed.
For example, you can see that the quality of the scanned image is a bit poor. That's because we lose too much information when we resize the image.
Indeed, try to keep the original size of the image and you will get a better result.
Also, you can apply adaptive thresholding at the final step to get a 'black and white' scanned image.