Perspective distortion occurs when a document is captured at an angle or with a skewed perspective, resulting in distorted text and shapes. Correcting perspective distortion ensures that the text and content within the document are aligned properly for accurate OCR analysis.

To perform perspective distortion correction of a document image, we need to identify control points, some landmarks within the image that can serve as reference points for the distortion correction process.

Unperspective script overview

Using the "unperspective" script within EasyRPA platform can be an effective approach to correct perspective distortion. The "unperspective" script, based on ImageMagick functions, is designed to automatically detect and rectify perspective distortion in document images. It analyzes the document image and automatically identifies the control points necessary for perspective distortion correction. This technique has certain limitations as it relies on the ability to accurately isolate the outline or boundary of the distorted quadrilateral in the input image while disregarding internal edges or finer details.

"Unperspective" script can correct not only perspective distortions but also rotation and skew.

Within EasyRPA platform, the "unperspective" script is seamlessly integrated into the image preprocessing pipeline. It can be called after ImageMagick options before performing OCR.

To apply the "unperspective" script on a document image:

provide the relevant settings in JSON format in the Document Set Details.
initiate the "Preprocess action" of an IE Document Processor.

Unperspective script options

Here are the optional parameters available in "unperspective" for additional customization and control over the perspective distortion correction:

-P prerotate image; choices are: autorotate (a), 90, 180, 270; default is no prerotate; autorotate only works, if the image has auto-orient metadata
-p background extraction procedure; choices are: floodfill (f), threshold (t), autothresh (a) (requires my otsuthresh script)
-C image channel to use for non-floodfill background extraction; choices are: gray, red, green, blue, cyan, magenta, yellow, black; default is no specific channel
-c pixel coordinate to extract background color; may be expressed as gravity value (e.g. northwest) or as "x,y" value; default is 0,0
-b background color outside the distorted quadrilateral; any valid IM color; default determined by coords argument
-f fuzz value for isolating quadrilateral from background; 0<=float<=100; default=10
-F morphology filter to smooth gaps and bumps on the mask boundary; integer>=0; default=0
-A area threshold for connected components filtering of mask image expressed as percent of image area; 0<integer<100; default is no connected components filtering
-a desired width/height aspect ratio; float>0; default will be computed automatically
-w desired width of output; default determined automatically from "default" parameter below; only one of width or height may be specified
-h desired height of output; default determined automatically from "default" parameter below; only one of width or height may be specified
-d default output dimension; choices are: el (length of first edge of quadrilateral used as height), bh (quadrilateral bounding box height), bw (quadrilateral bounding box width),h (input image height), w (input image width); default=el
-m method of determining quadrilateral cornersfrom peaks in depolar image; choices are: peak (p) or derivative (d); default=peak
-t threshold value for removing false peaks; integer>=0; default=4 for method=peak; default=10 for method=derivative
-s smoothing amount to remove false peaks; float>=0; default=1 for method=peak; default=5 for method=derivative
-S sharpening amount to amplify true peaks; float>=0; default=5 for method=peak; default=0 for method=derivative
-B blurring amount for preprocessing images of text with no quadrilateral outline; float>=0
-r desired rotation of output image; choices are: 90, 180 or 270; default is no rotation
-M monitor and display textual information about processing to the terminal
-i keep ancillary processing images; choices are: view or save; default is neither
-k kind of ancillary processing images; choices are: mask, polar, edge or all; default=mask
-ma trap for maximum aspect ratio; integer>0; default=10
-ml trap for minimum edge length; integer>0; default=10
-mp trap for maximum number of false peaks before filtering to remove false peaks; integer>0; default=40
-mr trap for maximum intermediate/input dimension ratio; integer>0; default=10
-T turn off internal traps; choices are; maxaspect (ma), minlenght (ml), maxpeaks (mp), maxratio (mr) or all (a)
-V disable viewport crop of output

Recommended setting for unperspective script

Binary images

Below you can find an example of JSON settings for "unperspective" script for binary images. The provided settings are well-suited for black and white images that have well-defined borders and corners of the document.

UNPERSPECTIVE script JSON settings example for Binary Images

"imagePostprocessScriptsBucket": "data/ocr_sample/scripts",
	"imagePostprocessScripts": {
	"unperspective": [
		"-C",
		"black",
		"-i",
		"save",
		"-A",
		"10",
		"-s",
		"5",
		"-t",
		"18",
		"-B",
		"1"
	]
	}

Here are two examples of document images, showcasing the effects of applying the "unperspective" script with the provided settings

Example of a binary document image with perspective distortion but unrotated (input and output images):

Example of a binary document image with perspective distortion and also rotated (input and output images):

Color images

The "unperspective" script works well for binary text images with a high-contrast background and well-defined borders and corners. However, when dealing with color images, such as photographs of text documents, the script may encounter challenges in accurately detecting the peaks required for perspective correction. In such cases, it is recommended to preprocess the color image by converting it to a binary format first before applying the "unperspective" script. Binarization helps simplify the image and enhances the visibility of text and geometric features, enabling the script to more effectively identify and correct the perspective distortion.

One effective approach to binarize images prior to applying the "unperspective" is by utilizing the "otsuthresh" script. This script emplys Otsu's thresholding method which is an image segmentation technique that automatically determines the optimal threshold value for converting an image to black and white.

By reducing the color information to binary values, the unperspective script can focus solely on the text and geometric features. The binarization process enhances the visibility of text and geometric structures by emphasizing high-contrast regions. This facilitates the unperspective script in accurately detecting the peaks required for perspective correction.

Below you can find an example of JSON settings for a sequence of "otsuthresh" and "unperspective" scripts.

OTSUTHRESH and UNPERSPECTIVE script JSON settings example for Color Images

"imagePostprocessScriptsBucket": "data/ocr_sample/scripts",
	"imagePostprocessScripts": {
	"otsuthresh": [],
	"unperspective": [
		"-C",
		"black",
		"-i",
		"save",
		"-A",
		"12",
		"-s",
		"5",
		"-t",
		"10",
		"-B",
		"1",
		"-d",
		"h",
		"-a",
		"0.79"
	]
	}

Example of a color document image with perspective distortion (input and output images):

Troubleshooting

In case "unperspective" doesn't work as expected on a colored image, check the intermediary binary image.

Uneven background

If the background has too uneven color or texture the script might not be able to cleanly separate the background from the foreground. You can mitigate the issue by tweaking on the -A and -t parameters. Increasing these parameters slightly might bring better results as it will increase the peak detection threshold so that it does not consider small bump and gaps as peaks.

Example of a binary image with uneven background:

Background detection coords

By default the pixel coordinates to extract background color "x,y=0,0", which is upper left corner. In case there is a white spot there the background will not be correctly detected. The coordinates for background detection will have to be changed.

Example of a binary image with a white spot in the upper left corner:

To test and play with "unperspective" script, download Optical Character Recognition Sample Process (OCR Sample).

To read more about image binarization, see Image Binarization.

To find out about other ImageMagick scripts, see OCR Analysis and Built-in OCR.

Resources:

Unperspective script:

http://www.fmwconcepts.com/imagemagick/unperspective/index.php

Perspective Distortion Correction