Overview

MclClassification Sample performs automatic processing of PDF documents. The process is designed to determine which of the following classes a document belongs to using the Classification ML model:

Debit Note
Invoice
Bank Statement
Contract

MclClassification Sample Lifecycle includes:

Step 1. Prepare Documents
Step 2. Classify Documents
Step 3. Verify Extracted Data

Prerequisites

In order to successfully set up and run MclClassification Sample Process:

Upload the MclClassification Sample package to the Control Server. The package can be found in the following directory: http://<CS host>/nexus/repository/rpaplatform/eu/ibagroup/samples/ap/easy-rpa-mcl-ap/<EasyRPA version>/easy-rpa-mcl-ap-<EasyRA version>-bin.zip
The source code can be found here: https://code.easyrpa.eu/easyrpa/easy-rpa-samples/-/tree/dev/easy-rpa-ml-aps/easy-rpa-mcl-ap

MclClassification Sample Package structure

Folder	Description
CL Document Processor	Standard classification automation process.
IE_SAMPLE_MULTI_CLASSIFICATION	Classification document set. Contains financial documents test samples
IE Sample Multi Classification	Classification document type. Defines the classes of the documents to determine
Classification Task	Classification human task type. Defines the task input form in the Workspace
easy-rpa-cldp-ap-<EasyRPA version>.jar	Root archive and dependencies. Contains code of CL Document Processor automation process
ie_sample_multi_classification-<version>.tar.gz	Classification ML model

Included Steps

Step 1. Prepare Documents

On this step input data for ML models execution is prepared. Document images are cleaned with ImageMagick and sent to Tesseract OCR for scanning. Tesseract options and ImageMagick options are provided in Document Set details. To launch the document preparation process navigate to the imported Document Set and run the "Preprocess" job. Refer to Process Documents for more details.

Step 2. Classify Documents

Once documents are prepared, a pre-trained ML Classification model is executed to predict a document’s category/categories. The ML Classification model also provides a confidence measure indicating how confident it is that the assigned classification tag is correct. The confidence score threshold is provided in the settings of the Document type. To launch the document classification process navigate to the imported Document Set and run the "Execute Model" job. Refer to Execute Model for more details.

Step 3. Verify Extracted Data

This step enables human verification and corrections to ensure the accuracy of classification. After the relevant classes have been defined for a document, a human task is created and needs to be completed in Workspace. It contains ML Classification model output that humans can review, validate and correct. To send documents to Workspace for human verification navigate to the imported Document Set and run the "Move Model to Human" and "Send to Workspace" job. Refer to Send to Workspace and Move Model to Human for more details.

Multiclass Classification Sample Process (MclClassification)