Overview
Prerequisites
- HT Sample Process Package structure
- Data Stores for HT Sample Automation Process:
Included Steps
Multi Document Information Extraction Human Task

Overview

HT Sample performs automatic processing of Form documents such as Quiz, Customer Survey, Article, Financial Report (multiple source documents). The process is designed to generate documents from Data Store input, send them to Workspace for human processing and gather filled by human information into separate Data Store.

Quiz

In the Quiz user would need to go through set of questions and submit answers through checking appropriate radio boxes.

Example of the Quiz opened in Workspace:

Customer Survey

In the Customer Survey user would need to go through set of input fields and enter personal data like typing name, email or uploading avatar image.

Example of the survey opened in Workspace:

Article

In the Article user would need to do classification of what this article is about. Answer should be submitted by selecting one of the check-boxes provided in the Categories choice.

Example of the Article opened in Workspace:

Finance Report (multiple source documents)

In the Finance Report user would need to extract financial data from the multiple sources. Extraction can be done by selecting a report input field (left area) and then mapping a value from a source document tab (right area).

Example of the Finance Report opened in Workspace with the first source document (FS) tab selected:

and second source document (CAS) tab selected:

Prerequisites

In order to successfully set up and run HT Sample Process:

Ensure that you have a running node with the "AP_RUN" capability.
Upload the HT Sample package to the Control Server. The package can be found in the following directory: http://<CS host>/nexus/repository/rpaplatform/eu/ibagroup/samples/ap/easy-rpa-ht-ap/<EasyRPA version>/easy-rpa-ht-ap-<EasyRA version>-bin.zip. The souce code: https://code.easyrpa.eu/easyrpa/easy-rpa-samples/-/tree/dev/easy-rpa-ht-ap
Ensure the following details are provided for the HT Sample automation process in the Automation Process Details tab
Module class: eu.ibagroup.sample.ht.HtSample
Group Id: eu.ibagroup.samples.ap
Artifact Id: easy-rpa-ht-ap
Version Id: <EasyRPA version>

HT Sample Process Package structure

Resource	Type	Description
HT Sample	Automation Process	HT Sample automation process
HT_SAMPLE_INPUT	Datastore	Data Store that contains an Input for Human Tasks
HT Sample Articles Classification	Document Type	Classification document type. Defines document categories to be identified
HT Sample Customer Survey	Document Type	Form document type. Defines document fields to be filled
HT Sample Multi Document IE	Document Type	Financial Details document type. Defines document details to be extracted
HT Sample Quiz	Document Type	Form document type. Defines document fields to be filled
Classification Task	Human Task Type	Classification human task type. Defines the task input form in the Workspace
Form Task	Human Task Type	Form human task type. Defines the task input form in the Workspace
IE Multi Doc Task	Human Task Type	Information Extraction human task type that supports multiple source documents. Defines the task input form in the Workspace
easy-rpa-iehml-ap-<EasyRPA version>.jar	JAR file	Root archive and dependencies. Contains code of HT Sample automation process
storage/data	Storage	Provides multiple referenced resources for IE Multi Doc Task HTT

Data Stores for HT Sample Automation Process:

Name	Columns
HT_SAMPLE_INPUT	document_type, name, description, priority, task_input, task_output
HT_SAMPLE_RESULT	document_type, name, description, priority, task_input, task_output

document_type - document type to which document belongs
name - name of the input document which is displayed in Workspace
description - description of the input document which is displayed in Workspace
priority - a priority with which the Automation Process will process the Human Task
task_input - input data for Human Task
task_output - output data generated as a result of Human Task processing

Included Steps

Step 1. Ingest Documents

RPA bot generates documents from the HT_SAMPLE_INPUT. The data store record of each document contains the initial name of the original document, its description, priority, associated document type, and document input. Please note, that documents with empty task_input are generated from the associated Document type.

After this step, a separate workflow of RPA tasks is created for each document and documents are sent to Workspace for further processing by human.

Step 2. Process Documents

Once human tasks are created it needs to be completed in Workspace. It contains Form input related to four Document types that humans can fill in accordance with Document type validation.

Step 3. Import Processed Data to Data Store

As soon as a human task is completed an RPA task is created to input the extracted data into HT_SAMPLE_RESULT Data Store.

Multi Document Information Extraction Human Task

Document Type JSON Structure

Below you can find an example of the Settings JSON for the Multi Document Information Extraction Task:

Information Extraction Document Type JSON example

{
	"importStrategy": "OVERRIDE",
	"name": "HT Sample Multi Document IE",
	"description": "HT Sample Multi Document Information Extraction",
	"humanTaskTypeName": "IE Multi Doc Task",
	"settings": {
		"colors": {
			"yearEnd": "#ff0000"
		}
	}
}

These settings contain:

colors(map) (optional) - overrides highlight colors
- <entityName> (hex color value) - extracted entities with name entityName will be highlighted with this color

Input JSON for processed PDF files

Represents the document's OCR result in JSON format superimposed on the original documents picture. It is generated automatically during OCR preprocessing:

Input JSON example for PDF documents

{
	"investments": [
		{
			"auditor": {
				"name": "Ernst & Young LLP",
				"trusted": true
			},
			"cas": {
				"documentName": "Capital Account Statements for 2021.pdf",
				"images": [
					{
						"content": "${data.bucket.url}/ie-multi-doc-sample/images/cas/0/content.jpg",
						"dimensions": {
							"height": "4095",
							"width": "2896"
						},
						"json_src": "${data.bucket.url}/ie-multi-doc-sample/images/cas/0/json_src.json",
						"tessinput": "${data.bucket.url}/ie-multi-doc-sample/images/cas/0/tessinput.jpg"
					}
				]
			},
			"check": true,
			"deviation": {
				"abc": 17816.54,
				"percentage": 0.08
			},
			"fs": {
				"documentName": "Financial Statements for 2021.pdf",
				"images": [
					{
						"content": "${data.bucket.url}/ie-multi-doc-sample/images/fs/0/content.jpg",
						"dimensions": {
							"height": "4095",
							"width": "2896"
						},
						"json_src": "${data.bucket.url}/ie-multi-doc-sample/images/fs/0/json_src.json",
						"tessinput": "${data.bucket.url}/ie-multi-doc-sample/images/fs/0/tessinput.jpg"
					},
					{
						"content": "${data.bucket.url}/ie-multi-doc-sample/images/fs/1/content.jpg",
						"dimensions": {
							"height": "4095",
							"width": "2896"
						},
						"json_src": "${data.bucket.url}/ie-multi-doc-sample/images/fs/1/json_src.json",
						"tessinput": "${data.bucket.url}/ie-multi-doc-sample/images/fs/1/tessinput.jpg"
					},
					{
						"content": "${data.bucket.url}/ie-multi-doc-sample/images/fs/2/content.jpg",
						"dimensions": {
							"height": "4095",
							"width": "2896"
						},
						"json_src": "${data.bucket.url}/ie-multi-doc-sample/images/fs/2/json_src.json",
						"tessinput": "${data.bucket.url}/ie-multi-doc-sample/images/fs/2/tessinput.jpg"
					},
					{
						"content": "${data.bucket.url}/ie-multi-doc-sample/images/fs/3/content.jpg",
						"dimensions": {
							"height": "4095",
							"width": "2896"
						},
						"json_src": "${data.bucket.url}/ie-multi-doc-sample/images/fs/3/json_src.json",
						"tessinput": "${data.bucket.url}/ie-multi-doc-sample/images/fs/3/tessinput.jpg"
					}
				]
			},
			"id": 1,
			"name": "Investment 1"
		},
		{
			"auditor": {
				"name": "Smith & Johns",
				"trusted": false
			},
			"cas": {
				"documentName": "Capital Account Statements for 2021.pdf",
				"images": [
					{
						"content": "${data.bucket.url}/ie-multi-doc-sample/images/cas/0/content.jpg",
						"dimensions": {
							"height": "4095",
							"width": "2896"
						},
						"json_src": "${data.bucket.url}/ie-multi-doc-sample/images/cas/0/json_src.json",
						"tessinput": "${data.bucket.url}/ie-multi-doc-sample/images/cas/0/tessinput.jpg"
					}
				]
			},
			"check": false,
			"deviation": {
				"abc": 390318.94,
				"percentage": 5.51
			},
			"fs": {
				"documentName": "Financial Statements for 2021.pdf",
				"images": [
					{
						"content": "${data.bucket.url}/ie-multi-doc-sample/images/fs/0/content.jpg",
						"dimensions": {
							"height": "4095",
							"width": "2896"
						},
						"json_src": "${data.bucket.url}/ie-multi-doc-sample/images/fs/0/json_src.json",
						"tessinput": "${data.bucket.url}/ie-multi-doc-sample/images/fs/0/tessinput.jpg"
					},
					{
						"content": "${data.bucket.url}/ie-multi-doc-sample/images/fs/1/content.jpg",
						"dimensions": {
							"height": "4095",
							"width": "2896"
						},
						"json_src": "${data.bucket.url}/ie-multi-doc-sample/images/fs/1/json_src.json",
						"tessinput": "${data.bucket.url}/ie-multi-doc-sample/images/fs/1/tessinput.jpg"
					},
					{
						"content": "${data.bucket.url}/ie-multi-doc-sample/images/fs/2/content.jpg",
						"dimensions": {
							"height": "4095",
							"width": "2896"
						},
						"json_src": "${data.bucket.url}/ie-multi-doc-sample/images/fs/2/json_src.json",
						"tessinput": "${data.bucket.url}/ie-multi-doc-sample/images/fs/2/tessinput.jpg"
					},
					{
						"content": "${data.bucket.url}/ie-multi-doc-sample/images/fs/3/content.jpg",
						"dimensions": {
							"height": "4095",
							"width": "2896"
						},
						"json_src": "${data.bucket.url}/ie-multi-doc-sample/images/fs/3/json_src.json",
						"tessinput": "${data.bucket.url}/ie-multi-doc-sample/images/fs/3/tessinput.jpg"
					}
				]
			},
			"id": 2,
			"name": "Investment 2"
		}
	]
}

Input JSON contains:
- - investments (list of objects) - the root element containing a list of investments data to display
    - id (number) - unique id of the investment
    - name (string) - name of the investment
    - check (boolean) - shows whether there are inconsistencies between Financial Statement and Capital Account Statement documents
    - auditor (object) - information about investment auditor
      - name (string) - name of the auditor
      - trusted (booelan) - shows whether auditor is trusted
    - deviation (object) - contains deviation values for extracted data
      - abc (number) - result of ABC analysis
      - percentage (number) - relative deviation
    - fs (object) - result of OCR processing of Financial Statement document
      - documentName (string) - name of processed document (filename)
      - images (list of objects) - element containing a list of document configurations for every page.
        content (url or base64 string) - the source of the input document to display. It may be an URL to a document or the document's content encoded in base64 (e.g. the string value "data:image/jpg;base64,R0lGOD....").
        json_src or json (http link to a OCR-JSON file or JSON itself) - provides OCR information. OCR-JSON structure is described below.
        dimensions (object) - an object contains "width" and "height" parameters which represent the width and height of the original input document.
        width (integer) - width value of original input document.
        height (integer) - height value of original input document.
    - cs (object) - result of OCR processing of Capital Account Statement document. Has the same structure as fs property.

OCR-JSON object has the following structure which is generated by the OCR component by itself:

OCR-JSON

"json": {
	"pages": [
		"id": "page0",
		"areas": [
			{
				"id": "page0_area0",
				"paragraphs": [
					{
						"id": "page0_area0_paragraph0",
						"lines": [
							{
								"id": "page0_area0_paragraph0_line0",
								"words": [
									{
										"id": "page0_area0_paragraph0_line0_word0",
										"text": "Advanced",
										"properties": {
											"bbox": [
												0.05999032414126754,
												0.037290455011974,
												0.14078374455732948,
												0.046527540198426275
											],
										"x_fsize": 0,
										"x_wconf": 96
									}
								},
							}
						...
						]
					}
				]
			}
		]
	}

This JSON contains all OCRed words and is kept in a tree-like structure: pages → areas → paragraphs → lines → words

json (object) - root element
- pages (list of objects) - list of pages structure. Each page in the list has the following structure:
  - id (string) - id of the page
  - areas (list of objects) - list of areas structure. Each area in the list has the following structure:
    - id (string) - id of the area
    - paragraphs (list of objects) - list of paragraphs structure. Each paragraph in the list has the following structure:
      - id (string) - id of the paragraph
      - lines (list of objects) - list of lines structure. Each line in the list has the following structure:
        id (string) - id of the line
        words (list of objects) - list of words structure. Each word in the list has the following structure:
        id (string) - id of the word
        text (string) - original text extracted by OCR engine
        properties (object) - property object with the following structure:
        bbox (list of integers) - top-left and bottom-right coordinates of the rectangle around the word in the original document. Coordinates are normalized to be from 0 to 1 relative to original document size
        x_fsize (integer) - is the OCR-engine specific font size
        x_wconf (integer) - OCR-engine specific confidence for the entire contained substring. Higher values express higher confidence

Output JSON for processed PDF files

As an Output for PDF Information Extraction Human Task produces the following JSON:

Output JSON example for PDF Documents

{
	"entities": {
		"1": {
			"casCurrency": {
				"content": "EUR",
				"document": "cas",
				"words": [
					{
						"bbox": [
							0.319060773480663,
							0.150671550671551,
							0.352209944751381,
							0.159462759462759
						],
						"id": "page0_area10_paragraph0_line0_word4",
						"page": 0,
						"text": "EUR",
						"x_wconf": 96
					}
				]
			},
			"casDate": {
				"content": "31 December 2021",
				"document": "cas",
				"words": [
					{
						"bbox": [
							0.25621546961326,
							0.106959706959707,
							0.270718232044199,
							0.115750915750916
						],
						"id": "page0_area1_paragraph0_line0_word4",
						"page": 0,
						"text": "31",
						"x_wconf": 96
					},
					{
						"bbox": [
							0.280041436464088,
							0.106959706959707,
							0.355662983425414,
							0.115750915750916
						],
						"id": "page0_area1_paragraph0_line0_word5",
						"page": 0,
						"text": "December",
						"x_wconf": 96
					},
					{
						"bbox": [
							0.361533149171271,
							0.106959706959707,
							0.394682320441989,
							0.115750915750916
						],
						"id": "page0_area1_paragraph0_line0_word6",
						"page": 0,
						"text": "2021",
						"x_wconf": 96
					}
				]
			},
			"currency": {
				"content": "EUR",
				"document": "fs",
				"words": [
					{
						"bbox": [
							0.319060773480663,
							0.164346764346764,
							0.352209944751381,
							0.173137973137973
						],
						"id": "page0_area8_paragraph0_line0_word4",
						"page": 0,
						"text": "EUR",
						"x_wconf": 96
					}
				]
			},
			"endingCapitalBalance": {
				"content": "371,135",
				"document": "cas",
				"words": [
					{
						"bbox": [
							0.707527624309392,
							0.702319902319902,
							0.766574585635359,
							0.712576312576313
						],
						"id": "page0_area10_paragraph0_line36_word5",
						"page": 0,
						"text": "371,135",
						"x_wconf": 96
					}
				]
			},
			"fsSigned": {
				"content": "No"
			},
			"navCAS": {
				"content": "100,312",
				"document": "fs",
				"words": [
					{
						"bbox": [
							0.653660220994475,
							0.636630036630037,
							0.712361878453039,
							0.647130647130647
						],
						"id": "page3_area17_paragraph0_line0_word7",
						"page": 3,
						"text": "100,312",
						"x_wconf": 73
					}
				]
			},
			"navFS": {
				"content": "2,288,500",
				"document": "fs",
				"words": [
					{
						"bbox": [
							0.375690607734807,
							0.402930402930403,
							0.449240331491713,
							0.413431013431013
						],
						"id": "page3_area11_paragraph0_line8_word4",
						"page": 3,
						"text": "2,288,500",
						"x_wconf": 85
					}
				]
			},
			"navPerShare": {
				"content": "42,014",
				"document": "fs",
				"words": [
					{
						"bbox": [
							0.74378453038674,
							0.345787545787546,
							0.793508287292818,
							0.356288156288156
						],
						"id": "page1_area10_paragraph0_line3_word3",
						"page": 1,
						"text": "42,014",
						"x_wconf": 96
					}
				]
			},
			"navPerShareFS": {
				"content": "58,298",
				"document": "fs",
				"words": [
					{
						"bbox": [
							0.661947513812155,
							0.402930402930403,
							0.712361878453039,
							0.413431013431013
						],
						"id": "page3_area11_paragraph0_line8_word7",
						"page": 3,
						"text": "58,298",
						"x_wconf": 96
					}
				]
			},
			"netAssetValue": {
				"content": "298,826",
				"document": "cas",
				"words": [
					{
						"bbox": [
							0.707182320441989,
							0.728693528693529,
							0.766574585635359,
							0.741880341880342
						],
						"id": "page0_area10_paragraph0_line37_word8",
						"page": 0,
						"text": "298,826",
						"x_wconf": 96
					}
				]
			},
			"targetInvestment": {
				"content": "2,288,500",
				"document": "fs",
				"words": [
					{
						"bbox": [
							0.375690607734807,
							0.636630036630037,
							0.449240331491713,
							0.647130647130647
						],
						"id": "page3_area17_paragraph0_line0_word4",
						"page": 3,
						"text": "2,288,500",
						"x_wconf": 82
					}
				]
			},
			"totalCapital": {
				"content": "43,897",
				"document": "fs",
				"words": [
					{
						"bbox": [
							0.74378453038674,
							0.578998778998779,
							0.794544198895028,
							0.58949938949939
						],
						"id": "page0_area8_paragraph4_line1_word4",
						"page": 0,
						"text": "43,897",
						"x_wconf": 96
					}
				]
			},
			"totalCommitment": {
				"content": "49,161",
				"document": "cas",
				"words": [
					{
						"bbox": [
							0.716160220994475,
							0.619047619047619,
							0.763812154696133,
							0.632234432234432
						],
						"id": "page0_area10_paragraph0_line31_word8",
						"page": 0,
						"text": "49,161",
						"x_wconf": 96
					}
				]
			},
			"totalCommitmentOfFund": {
				"content": "3,232,882",
				"document": "fs",
				"words": [
					{
						"bbox": [
							0.670234806629834,
							0.51013431013431,
							0.74378453038674,
							0.520634920634921
						],
						"id": "page2_area15_paragraph0_line21_word2",
						"page": 2,
						"text": "3,232,882",
						"x_wconf": 96
					}
				]
			},
			"yearEnd": {
				"content": "31 December 2021",
				"document": "fs",
				"words": [
					{
						"bbox": [
							0.25621546961326,
							0.106959706959707,
							0.270718232044199,
							0.115750915750916
						],
						"id": "page0_area1_paragraph0_line0_word4",
						"page": 0,
						"text": "31",
						"x_wconf": 96
					},
					{
						"bbox": [
							0.280041436464088,
							0.106959706959707,
							0.355662983425414,
							0.115750915750916
						],
						"id": "page0_area1_paragraph0_line0_word5",
						"page": 0,
						"text": "December",
						"x_wconf": 96
					},
					{
						"bbox": [
							0.361533149171271,
							0.106959706959707,
							0.394682320441989,
							0.115750915750916
						],
						"id": "page0_area1_paragraph0_line0_word6",
						"page": 0,
						"text": "2021",
						"x_wconf": 96
					}
				]
			}
		},
		"2": {
			"casCurrency": {
				"content": "EUR",
				"document": "cas",
				"words": [
					{
						"bbox": [
							0.319060773480663,
							0.150671550671551,
							0.352209944751381,
							0.159462759462759
						],
						"id": "page0_area10_paragraph0_line0_word4",
						"page": 0,
						"text": "EUR",
						"x_wconf": 96
					}
				]
			},
			"casDate": {
				"content": "31 December 2021",
				"document": "cas",
				"words": [
					{
						"bbox": [
							0.25621546961326,
							0.106959706959707,
							0.270718232044199,
							0.115750915750916
						],
						"id": "page0_area1_paragraph0_line0_word4",
						"page": 0,
						"text": "31",
						"x_wconf": 96
					},
					{
						"bbox": [
							0.280041436464088,
							0.106959706959707,
							0.355662983425414,
							0.115750915750916
						],
						"id": "page0_area1_paragraph0_line0_word5",
						"page": 0,
						"text": "December",
						"x_wconf": 96
					},
					{
						"bbox": [
							0.361533149171271,
							0.106959706959707,
							0.394682320441989,
							0.115750915750916
						],
						"id": "page0_area1_paragraph0_line0_word6",
						"page": 0,
						"text": "2021",
						"x_wconf": 96
					}
				]
			},
			"currency": {
				"content": "EUR",
				"document": "fs",
				"words": [
					{
						"bbox": [
							0.319060773480663,
							0.164346764346764,
							0.352209944751381,
							0.173137973137973
						],
						"id": "page0_area8_paragraph0_line0_word4",
						"page": 0,
						"text": "EUR",
						"x_wconf": 96
					}
				]
			},
			"endingCapitalBalance": {
				"content": "371,135",
				"document": "cas",
				"words": [
					{
						"bbox": [
							0.707527624309392,
							0.702319902319902,
							0.766574585635359,
							0.712576312576313
						],
						"id": "page0_area10_paragraph0_line36_word5",
						"page": 0,
						"text": "371,135",
						"x_wconf": 96
					}
				]
			},
			"fsSigned": {
				"content": "No"
			},
			"navCAS": {
				"content": "100,312",
				"document": "fs",
				"words": [
					{
						"bbox": [
							0.653660220994475,
							0.636630036630037,
							0.712361878453039,
							0.647130647130647
						],
						"id": "page3_area17_paragraph0_line0_word7",
						"page": 3,
						"text": "100,312",
						"x_wconf": 73
					}
				]
			},
			"navFS": {
				"content": "2,288,500",
				"document": "fs",
				"words": [
					{
						"bbox": [
							0.375690607734807,
							0.402930402930403,
							0.449240331491713,
							0.413431013431013
						],
						"id": "page3_area11_paragraph0_line8_word4",
						"page": 3,
						"text": "2,288,500",
						"x_wconf": 85
					}
				]
			},
			"navPerShare": {
				"content": "42,014",
				"document": "fs",
				"words": [
					{
						"bbox": [
							0.74378453038674,
							0.345787545787546,
							0.793508287292818,
							0.356288156288156
						],
						"id": "page1_area10_paragraph0_line3_word3",
						"page": 1,
						"text": "42,014",
						"x_wconf": 96
					}
				]
			},
			"navPerShareFS": {
				"content": "58,298",
				"document": "fs",
				"words": [
					{
						"bbox": [
							0.661947513812155,
							0.402930402930403,
							0.712361878453039,
							0.413431013431013
						],
						"id": "page3_area11_paragraph0_line8_word7",
						"page": 3,
						"text": "58,298",
						"x_wconf": 96
					}
				]
			},
			"netAssetValue": {
				"content": "298,826",
				"document": "cas",
				"words": [
					{
						"bbox": [
							0.707182320441989,
							0.728693528693529,
							0.766574585635359,
							0.741880341880342
						],
						"id": "page0_area10_paragraph0_line37_word8",
						"page": 0,
						"text": "298,826",
						"x_wconf": 96
					}
				]
			},
			"targetInvestment": {
				"content": "2,288,500",
				"document": "fs",
				"words": [
					{
						"bbox": [
							0.375690607734807,
							0.636630036630037,
							0.449240331491713,
							0.647130647130647
						],
						"id": "page3_area17_paragraph0_line0_word4",
						"page": 3,
						"text": "2,288,500",
						"x_wconf": 82
					}
				]
			},
			"totalCapital": {
				"content": "43,897",
				"document": "fs",
				"words": [
					{
						"bbox": [
							0.74378453038674,
							0.578998778998779,
							0.794544198895028,
							0.58949938949939
						],
						"id": "page0_area8_paragraph4_line1_word4",
						"page": 0,
						"text": "43,897",
						"x_wconf": 96
					}
				]
			},
			"totalCommitment": {
				"content": "49,161",
				"document": "cas",
				"words": [
					{
						"bbox": [
							0.716160220994475,
							0.619047619047619,
							0.763812154696133,
							0.632234432234432
						],
						"id": "page0_area10_paragraph0_line31_word8",
						"page": 0,
						"text": "49,161",
						"x_wconf": 96
					}
				]
			},
			"totalCommitmentOfFund": {
				"content": "3,232,882",
				"document": "fs",
				"words": [
					{
						"bbox": [
							0.670234806629834,
							0.51013431013431,
							0.74378453038674,
							0.520634920634921
						],
						"id": "page2_area15_paragraph0_line21_word2",
						"page": 2,
						"text": "3,232,882",
						"x_wconf": 96
					}
				]
			},
			"yearEnd": {
				"content": "31 December 2021",
				"document": "fs",
				"words": [
					{
						"bbox": [
							0.25621546961326,
							0.106959706959707,
							0.270718232044199,
							0.115750915750916
						],
						"id": "page0_area1_paragraph0_line0_word4",
						"page": 0,
						"text": "31",
						"x_wconf": 96
					},
					{
						"bbox": [
							0.280041436464088,
							0.106959706959707,
							0.355662983425414,
							0.115750915750916
						],
						"id": "page0_area1_paragraph0_line0_word5",
						"page": 0,
						"text": "December",
						"x_wconf": 96
					},
					{
						"bbox": [
							0.361533149171271,
							0.106959706959707,
							0.394682320441989,
							0.115750915750916
						],
						"id": "page0_area1_paragraph0_line0_word6",
						"page": 0,
						"text": "2021",
						"x_wconf": 96
					}
				]
			}
		}
	}
}

It has the following structure:

entities (map) - root element which contains a list of extracted entities.
- <investmentId> (map) - map of entities extracted for investment with id investmentId
  - <entityName> (object) - stores information about entity witn name entityName
    - content (string) - final output text of the extracted entity.
    - document ("fs" | "cas") - type of document entity was extracted from.
    - words (list of objects) - list of word objects. If the extracted text consists of several words, this list will contain several word objects as follows:
      - content (string) - original text from the input document.
      - bbox (list of integers) - top-left and bottom-right coordinates of the rectangle surrounding the word in the original document. The coordinates are normalized to be from 0 to 1 relative to the original document size.
      - id (string) - id of the word.
      - page (integer) - original document page number where the word appears.

Human Task Sample (HT Sample)