Skip to main content

Multiclass Classification Sample Process (MclClassification)

Multiclass Classification Sample Process (MclClassification)

Overview

MclClassification Sample performs automatic processing of PDF documents. The process is designed to determine which of the following classes a document belongs to using the Classification ML model:

  • Debit Note
  • Invoice
  • Bank Statement
  • Contract

MclClassification Sample Lifecycle includes:

Prerequisites

In order to successfully set up and run MclClassification Sample Process:

  1. Upload the MclClassification Sample package to the Control Server. The package can be found in the following directory:  http://<CS host>/nexus/repository/rpaplatform/eu/ibagroup/samples/ap/easy-rpa-mcl-ap/<EasyRPA version>/easy-rpa-mcl-ap-<EasyRA version>-bin.zip
    The source code can be found here: https://code.easyrpa.eu/easyrpa/easy-rpa-samples/-/tree/dev/easy-rpa-ml-aps/easy-rpa-mcl-ap

MclClassification Sample Package structure

Folder

Description

CL Document Processor

Standard classification automation process.

IE_SAMPLE_MULTI_CLASSIFICATION

Classification document set. Contains financial documents test samples

IE Sample Multi Classification

Classification document type. Defines the classes of the documents to determine

Classification Task

Classification human task type. Defines the task input form in the Workspace

easy-rpa-cldp-ap-<EasyRPA version>.jarRoot archive and dependencies. Contains code of CL Document Processor automation process

ie_sample_multi_classification-<version>.tar.gz

Classification ML model

Included Steps

Step 1. Prepare Documents

On this step input data for ML models execution is prepared. Document images are cleaned with ImageMagick and sent to Tesseract OCR for scanning. Tesseract options and ImageMagick options are provided in Document Set details. To launch the document preparation process navigate to the imported Document Set and run the "Preprocess" job. Refer to Process Documents for more details.

Step 2. Classify Documents

Once documents are prepared, a pre-trained ML Classification model is executed to predict a document’s category/categories. The ML Classification model also provides a confidence measure indicating how confident it is that the assigned classification tag is correct. The confidence score threshold is provided in the settings of the Document type. To launch the document classification process navigate to the imported Document Set and run the "Execute Model" job. Refer to Execute Model for more details.

Step 3. Verify Extracted Data

This step enables human verification and corrections to ensure the accuracy of classification. After the relevant classes have been defined for a document, a human task is created and needs to be completed in Workspace. It contains ML Classification model output that humans can review, validate and correct. To send documents to Workspace for human verification navigate to the imported Document Set and run the "Move Model to Human" and "Send to Workspace" job. Refer to Send to Workspace and Move Model to Human for more details.