Assessing Business Case
Assessing Business Case
Evaluate the problem before focusing on the data
The biggest challenge when you start working on an ML project is to understand the problem clearly before focusing on the data. Before you start thinking about how to solve the customer’s problem with ML, assess the business case with the following questions in mind:
- What problem is the customer trying to solve?
- Is Machine Learning the right tool to solve this problem?
It is important to have a well-defined problem to decide on ML approach to recognizing data patterns. Machine learning can help automate business processes, but automation problems do not always require learning. Automation without learning is appropriate when there are tasks with clear predefined steps that are executed by a human, but that can be transitioned to a robot. Machine Learning methods solve business problems that require learning from data and prediction, for instance, evaluating the extent to which a piece of text is similar to previous texts that were seen.
Decide on the problem type
Once you verify that the business problem is suitable for machine learning, decide what problem type you are dealing with:
- What would you like your machine learning model to do?
- What type of use case is it, classification or information extraction?
When formulating the problem type, think about what the model will predict. Two most common problem types under supervised machine learning when dealing with business documents are classification and information extraction. With classification machine learning systems we seek a yes-or-no predictions such as ‘Is it an invoice?’, ‘Is it a spam message?’ and so on. Information extraction machine learning systems tackle the problem of extracting some particular information from an unstructured text. Understanding the problem type helps to perform proper analysis of the documents, derive data-driven insights and gather appropriate data to train a model.
Examine the available data
After framing the problem for machine learning the next step is to evaluate whether you have the right data to solve it:
- What data should Machine Learning system use to make predictions?
- What are the sources of your data? How many training examples is it possible to provide?
Machine Learning requires data and should only be applied when there is access to a sizable set of data from which to train a model. There is no simple answer about how much data you need and whether it is likely to be a good fit for your problem. Every feature that you include in your model increases the number of data records you need to train the model. You should also take into account splitting your dataset into the subsets for training and for testing.
Define the outcome
For a given document input, your machine learning model will learn to predict business specific outputs that will be deployed and assimilated into actual business processes, products and services:
- What outcome is the customer trying to achieve?
- What are the most relevant factors for predicting customer’s specific outcome?
An algorithm doesn’t understand the business world. It’s crucial that a business analysts gets substantial information form the customer about the business logic of the documents, what data is actually relevant so he can select and slice the data in a way the algorithm will understand.
Know the success metrics
The last step before you begin the process is to decide what success means and when the model development phase can be completed. Consider similar questions:
- Is your problem the kind of problem where getting things right 80% of the time is enough?
- Are there certain kinds of errors that should never be allowed?
Every machine learning algorithm is error prone. The machine learning prediction engine will get things right most of the time but there will be times when it will inevitably make mistakes. In order not to be tempted to continue refining the model forever, extracting small improvements in accuracy, discuss with your customer what level of accuracy is sufficient for his needs and what might be the consequences of the corresponding level of error.