Skip to main content

Signature Detection & Recognition

Signature Detection & Recognition

Overview

Signature Detection & Recognition Sample focuses on solving customer’s issues with financial documents processing - where documents such as invoices contain signatures.  Signature Recognition Sample suggests the way to automate extraction of PDF invoice data together with signature detection and recognition. That is achieved with the subsequent application of two models -  IE and Signature Detection - followed by execution of postprocessor + validator combo.

Prerequisites

In order to successfully set up and run Signature Recognition Demo:

  1. Ensure that you have a running node with the "AP_RUN" capabilities.
  2. Upload the Signature Recognition Sample package to the Control Server. The package can be found in the following directory: https://<CS host>/nexus/repository/rpaplatform/eu/ibagroup/samples/ap/easy-rpa-signature-recognition-ap/<EasyRPA version>/easy-rpa-signature-recognition-ap-<EasyRPA version>-bin.zip
    The source code can be found here: https://code.easyrpa.eu/easyrpa/easy-rpa-samples/-/tree/dev/easy-rpa-ml-aps/easy-rpa-signature-recognition-ap
  3. Ensure the following details are provided for the Signature Recognition Sample automation process in the Automation Process Details tab:

    Module class: eu.ibagroup.sample.ml.signature.recognition.SignatureRecognitionSample

    Group Id: eu.ibagroup.samples.ap

    Artifact Id: easy-rpa-signature-recognition-ap

    Version Id: <EasyRPA version>

  4. Ensure the Control Server has the following models in the list

    • base_signature_detection_yolo5_model 
    • idp_sample_invoice
  5. To run this demo only Signature Recognition Sample automation process needs to be launched.



IDP Package Structure:

Folder

Description

Signature Recognition Sample

Signature Recognition Sample automation process

IE Document Processor

Standard information extraction automation process

Signature Recognition Sample Invoice

Invoice information extraction document type. Defines the entities to be extracted  from invoices, also defines combo of postprocessor + validator to recognize and validate detected signatures

	"mlPostProcessors": [
		{
			"entityName": "Signature",
			"name": "tagDetectedSignatures",
			"referencesMap": {
			"ceoSignature":"signature_recognition_sample/input/CeoSignature.jpg",
			"cooSignature":"signature_recognition_sample/input/CooSignature.jpg"
			},
			"similarity": 0.8
		}
	],
	"validators": [
		{
			"entityName": "Signature",
			"name": "hasAllValues",
			"mandatoryValues": ["ceoSignature","cooSignature"],
			"message": {
			"severity": "error",
			"text": "Document must be signed by both CEO and COO."
			 }
		}

Information Extraction Task

Information Extraction human task type. Defines the task input form in the Workspace

easy-rpa-signature-recognition-ap-<EasyRPA version>.jar

Root archive and dependencies. Contains code of IDP Sample automation process

models

Two ML models provided

  • base_signature_detection_yolo5_model (only declaration - model implementation is already available at the Control Server)
  • idp_sample_invoice
storage/dataFolder that contains documents to be uploaded in File Storage, also images of reference CEO and COO signatures.

Configuration Parameters for IDP Sample Automation Process:

Key

Default Value

Description

inputFolder

signature_recognition_sample/input

File Storage folder where input documents and reference images are stored.

fileFilter

.*\.pdf

Regular expression for files to select.

configuration

{
  "Invoice": {
    "dataStore": "SIGNATURE_RECOGNITION_SAMPLE_DOCUMENTS",
    "documentType": "Signature Recognition Sample Invoice",
    "model": "idp_sample_invoice",
    "runModel": "idp_sample_invoice,1.0.11",
    "storagePath": "signature_recognition_sample",
    "bucket": "data",
    "tesseractOptions": [
      "-l",
      "eng",
      "--psm",
      "12",
      "--oem",
      "3",
      "--dpi",
      "150"
    ],
    "imageMagickOptions": [
      "-units",
      "PixelsPerInch",
      "-resample",
      "150",
      "-density",
      "150",
      "-quality",
      "100",
      "-background",
      "white",
      "-deskew",
      "40%",
      "-contrast",
      "-alpha",
      "flatten"
    ]
  },
  "Signature Recognition": {
    "dataStore": "SIGNATURE_RECOGNITION_SAMPLE_DOCUMENTS",
    "documentType": "Signature Recognition Sample Invoice",
    "model": "base_signature_detection_yolo5_model",
    "runModel": "base_signature_detection_yolo5_model,0.1",
    "storagePath": "signature_recognition_sample",
    "bucket": "data",
    "tesseractOptions": [
      "-l",
      "eng",
      "--psm",
      "12",
      "--oem",
      "3",
      "--dpi",
      "150"
    ],
    "imageMagickOptions": [
      "-units",
      "PixelsPerInch",
      "-resample",
      "150",
      "-density",
      "150",
      "-quality",
      "100",
      "-background",
      "white",
      "-deskew",
      "40%",
      "-contrast",
      "-alpha",
      "flatten"
    ]
  }
}

Json parameter that provides mapping of document types and corresponding ML models and contains model name, model version and document type name of each model.



JAVA_OPTS

-Djavax.accessibility.assistive_technologies=

workaround to fix issue with Sikuli library not running on a linux node with Java8 - can be removed if not a linux java8 node is used

Data Store for IDP Sample Automation Process:

Name

Columns

SIGNATURE_RECOGNITION_SAMPLE_DOCUMENTS

ie_result, error_message, uuid, name, notes, status, url, s3_path, ocr_json, input_json, output_json, model_output_json, update_timestamp

Columns description:
  • ie_result - result of information extraction model execution on the document validated in Human Task.
  • error_message - message displayed in case of an error.
  • uuid - unique identifier of the document.
  • name - input document name as it appears in a human task.
  • notes - input document path inside file storage bucket.
  • status - document processing status.
  • url - input document file storage path.
  • s3_path - input document path inside file storage bucket.
  • ocr_json - result of OCR execution on the document.
  • input_json - document input data for the latest human task.
  • output_json - document output data of the latest human task.
  • model_output_json - temporary field containing latest result of the executed model.
  • update_timestamp - last update time of the data store record.

Included Steps

Step 1. Ingest Documents

RPA bot extracts documents from the dedicated folder in File Storage. It compiles a list of documents to be processed and creates records in the data store. The data store record of each document contains the initial name of the original document, the path to the File Storage folder with the original document, an associated uuid. The result of document processing on each step of Signature Recognition Sample automation process is also recorded in the data store. The status of the document which has just been extracted for processing is 'NEW'.

When a list of documents has been generated RPA bot prepares batches of documents for processing. The number of documents in a batch is determined by the configuration parameter batchSize.

After this step a separate workflow of RPA and ML tasks is created for each document.

Step 2. Prepare Documents

On this step input data for ML models execution is prepared. Document images are cleaned with ImageMagick and sent to Tesseract OCR for scanning. File Storage bucket name, Tesseract options and ImageMagick option are provided in configuration parameters. Files created as a result of the original document processing are saved to the same signature-recognition-sample File Storage folder where the original document is stored.

Step 3. Extract Data

The Invoice ML Information Extraction model is employed to extract the specific fields that need to be stored in the target system.

Step 4. Detect Signature

The Signature Recognition ML model is employed to detect signatures if any.

Step 5. Postprocess Detected Signature Data

At this stage single postprocessor is employed - tagDetectedSignatures - to recognize detected signatures by matching them to the provided list of signature references. 

NOTE: 'tagDetectedSignatures ' postprocessor was created for demo purposes only - for production use please replace Sikuli library with a dedicated signature matching/verification software. 

Step 6. Validate Recognized Signatures

At this stage single validator is employed - hasAllValues - an OOTB validator that ensures document signed by both CEO and COO. If any mandatory signature is missing - an error message "The Signature validation failed: Document must be signed by both CEO and COO." will be added to the list of validation message.

Step 7. Verify Extracted Data

This step enables human verification and corrections to ensure accuracy of data extracted from a document before it is imported into the target system. After the relevant business entities have been extracted from a document, a human task is created and needs to be completed in Workspace. It contains ML Information Extraction model output together with ML Signature Recognition model output plus messages from validators that humans can review, validate and correct.