Skip to main content

Image Binarization

Image Binarization

Binarization is the process of converting an image to black and white. While Tesseract performs internal binarization, the resulting quality may be suboptimal, especially when dealing with images that have uneven or varying background darkness. To ensure optimal OCR performance, it is recommended to provide Tesseract with a high-quality binary image as input. By performing binarization as a separate preprocessing step, you have greater control over the conversion process. This allows you to address specific challenges related to uneven background darkness, varying lighting conditions, or image artifacts that can hinder Tesseract's internal binarization.

ImageMagick offers binarization techniques which will enable you to effectively convert grayscale or color images to binary format while preserving important details and enhancing the contrast between text and background.

Binarization with -threshold

The ImageMagick -threshold option with a percentage value, such as -threshold 50%, will perform automatic thresholding based on the given percentage. It will convert the image to black and white by applying a threshold that separates the pixels into black and white based on their intensity levels.

Example of a binarized image with -threshold 50% IM option:


 However, using a fixed threshold of 50% for image binarization can have some potential drawbacks: 

  • Sensitivity to image contrast: images with low contrast or uneven lighting conditions may result in suboptimal binarization.
  • Loss of grayscale information: -treshold 50% may lead to the loss of subtle details and nuances present in the original image, especially if there are shades of gray that contain important information.
  • Inadequate separation of foreground and background: a fixed threshold might not be effective in separating the foreground (text) from the background in complex images. If the image contains gradients, textured backgrounds, or noise, a single threshold value may not adequately differentiate between the two.
  • Image-specific optimization: Different images may require different threshold values for optimal binarization. 

Example of a color image with uneven lightning and the result of -threshold 50% ImageMagick option:

In such cases, adaptive thresholding techniques that adjust the threshold dynamically based on local image characteristics might yield better results.

Otsuthresh script

An effective approach to binarize images is to use "otsuthresh" script. It utilizes Otsu's thresholding method that automatically determines the optimal threshold value for converting a grayscale or color  image to black and white.

By employing Otsu's thresholding algorithm, we can overcome the challenges caused by varying color intensities and gradients in the original image. The algorithm analyzes the histogram of the image and identifies the threshold that effectively separates the foreground (text) from the background.

"Otsuthresh" script is seamlessly integrated into the image preprocessing pipeline. It can be called after ImageMagick options before performing OCR. 

To apply the "unperspective" script on a document image 

  •  provide the relevant settings in JSON format in the Document Set Details (you do not need to provide any specific parameters):
OTSUTHRESH script JSON example
"imagePostprocessScriptsBucket": "data/ocr_sample/scripts",
	"imagePostprocessScripts": {
	"otsuthresh": [],
  •  initiate the Preprocess action of an IE Document Processor.

Example of the same color image with uneven lightning and the result of "otsuthresh" script:

To test and play with "otsuthresh" script, see Optical Character Recognition Sample Process (OCR Sample).

To find out about other ImageMagick scripts, see OCR Analysis and Built-in OCR.

Resources:

 ImageMagick command line options:

https://imagemagick.org/script/command-line-options.php

Otsuthresh script:

http://www.fmwconcepts.com/imagemagick/otsuthresh/index.php