Multiclass Classification Sample Process (MclClassification)
Multiclass Classification Sample Process (MclClassification)
Overview
MclClassification Sample performs automatic processing of PDF documents. The process is designed to determine which of the following classes a document belongs to using the Classification ML model:
- Debit Note
- Invoice
- Bank Statement
- Contract
MclClassification Sample Lifecycle includes:
Prerequisites
In order to successfully set up and run MclClassification Sample Process:
Upload the MclClassification Sample package to the Control Server. The package can be found in the following directory: http://<CS host>/nexus/repository/rpaplatform/eu/ibagroup/samples/ap/easy-rpa-mcl-ap/<EasyRPA version>/easy-rpa-mcl-ap-<EasyRA version>-bin.zip
The source code can be found here: https://code.easyrpa.eu/easyrpa/easy-rpa-samples/-/tree/dev/easy-rpa-ml-aps/easy-rpa-mcl-ap
MclClassification Sample Package structure
Folder | Description |
---|---|
CL Document Processor | Standard classification automation process. |
IE_SAMPLE_MULTI_CLASSIFICATION | Classification document set. Contains financial documents test samples |
IE Sample Multi Classification | Classification document type. Defines the classes of the documents to determine |
Classification Task | Classification human task type. Defines the task input form in the Workspace |
easy-rpa-cldp-ap-<EasyRPA version>.jar | Root archive and dependencies. Contains code of CL Document Processor automation process |
ie_sample_multi_classification-<version>.tar.gz | Classification ML model |
Included Steps
Step 1. Prepare Documents
On this step input data for ML models execution is prepared. Document images are cleaned with ImageMagick and sent to Tesseract OCR for scanning. Tesseract options and ImageMagick options are provided in Document Set details. To launch the document preparation process navigate to the imported Document Set and run the "Preprocess" job. Refer to Process Documents for more details.
Step 2. Classify Documents
Once documents are prepared, a pre-trained ML Classification model is executed to predict a document’s category/categories. The ML Classification model also provides a confidence measure indicating how confident it is that the assigned classification tag is correct. The confidence score threshold is provided in the settings of the Document type. To launch the document classification process navigate to the imported Document Set and run the "Execute Model" job. Refer to Execute Model for more details.
Step 3. Verify Extracted Data
This step enables human verification and corrections to ensure the accuracy of classification. After the relevant classes have been defined for a document, a human task is created and needs to be completed in Workspace. It contains ML Classification model output that humans can review, validate and correct. To send documents to Workspace for human verification navigate to the imported Document Set and run the "Move Model to Human" and "Send to Workspace" job. Refer to Send to Workspace and Move Model to Human for more details.