When AI practitioners talk about taking their machine learning models and deploying them into real-world environments, they don’t call it deployment. Instead the term that’s used in the industry is “operationalizing”. This might be confusing for traditional IT operations managers and applications developers. Why don’t we deploy or put into production AI models? What does AI operationalization mean and how is it different from the typical application development and IT systems deployment?
The “Inference” Phase of AI Projects and the Diversity of Application
One of the unique things about an AI project versus a traditional application development project is that there isn’t the same build / test / deploy / manage order of operations. Rather there are two distinct phases of operation: a “training” phase and an “inference” phase. The training phase involves the selection of one or more machine learning algorithms, the identification and selection of appropriate, clean, well-labeled data, the application of the data to the algorithm along with hyperparameter configurations to create an ML model, and then the validation and testing of that model to make sure that it can generalize properly without too much overfitting of training data or underfitting for generalization. All of those steps comprise just the training phase of an AI project.
On the other hand, the inference phase of an AI project focuses on the application of the ML model to the particular use case, ongoing evaluation to determine if the system is generalizing properly to real-world data, and adjustments to the model, development of new training set data, and hyperparameter configurations to iteratively improve the model. The inference phase can also be used to determine if there are additional use cases for the ML model that are broader than originally specified with the training data. In essence, the training phase happens in the organizational “laboratory” and the inference phase happens in the “real world”.
But the real world is where things get messy and complicated. First of all, as we hinted in our previous article, there’s no such thing as a single platform for machine learning. The universal machine learning / AI platform doesn’t exist because there are so many diverse places in which we can use an ML model to make inferences, do classification, predict values, and all the other problems we are looking for ML systems to solve. We could be using an ML model in an Internet of Things (IoT) device deployed at the edge, or in a mobile application that can operate disconnected from the internet, or in a cloud-based always-on setting, or in a large enterprise server system with private, highly regulated, or classified content, or in desktop applications, or in autonomous vehicles, or in distributed applications, or… you get the picture. Any place where the power of cognitive technology is needed is a place where these AI systems can be used.
This is both empowering and challenging. The data scientist developing the ML model might not have any expectations for how and where the ML model will be used, and so instead of “deploying” this model to a specific system, it needs to be “operationalized” in as many different systems, interfaces, and deployments as necessary. The very same model could be deployed in an IoT driver update as well as a cloud service API call. As far as the data scientists and data engineers are concerned, this is not a problem at all. The specifics of deployment are specific to the platforms on which the ML model will be used. But the requirements for the real-world usage and operation (hence the word “operationalization”) of the model are the same regardless of the specific application or deployment.
Requirements for AI Operationalization
Many of the early cognitive technology projects were indeed laboratory-style “experiments” that aimed to identify areas where AI could potentially help, but were never put into production. Many of these efforts were small-scale experiments run by data science organizations. However, to provide real value for the organization, these experiments need to move out of the laboratory and be real, reliable production models. This means that the tools used by data scientists for laboratory-style experiments are not really appropriate for real-world operations.
First and foremost, real-world, production, inference-phase AI projects need to be owned and managed by the line of business or IT operations that are responsible for the problem that the cognitive technology is solving. Is the AI model trained for fraud analysis? Is it classifying images for content moderation? Is the technology used for security-application facial detection? Is it creating content for social media posts? If so, then the data science organization responsible for crafting the model needs to hand it off to the organization responsible for those activities. This means there needs to be a place where the business or IT organization can monitor, manage, govern, and analyze the results of the ML models to make sure that it’s meeting their needs.
The biggest thing these organizations need to realize is that cognitive technologies are probabilistic by their very nature. This means it is guaranteed that ML models will not produce a 100% certain result. How will these organizations deal with these almost-but-not-quite-certain results? What is the threshold by which they will accept answers and what is the fall-back for less-than-certain results? These are things to be considered in the operationalization of ML models in the inference phase that are not relevant during the training phase.
An additional consideration for inference-phase operationalization of ML models is that they need to operate on data sets that are not going to necessarily be as clean as the training sets. Good training sets are clean, de-duped, and well-labeled. Perhaps you’ve trained your system to recognize characters on checks for an image-based deposit system. But in the real world, those images could be of very poor quality with bad lighting, poor resolution images, with shadows, improperly aligned images, and things in the way. How will the operational ML model deal with this bad data? What additional logic needs to be wrapped around the AI system to handle these bad inputs to avoid bad predictive outputs?
Another operationalization requirement to consider during the inference phase is the compute power necessary to run the model with satisfactory response time. Training compute requirements are not the same as inference-phase requirements. Deep learning based supervised learning may require an intense amount of compute power and heaps of data and images to create a satisfactory model, but once the model is created the operational compute requirements might be significantly lower. On the flip side, you might be using a simple K-Nearest Neighbors (KNN) algorithm that implements a “lazy” form of learning , requiring little compute at training time but potentially lots of computing horsepower at inference time. Clearly understanding, planning, and providing the right compute power for the inference phase is a critical operationalization requirement.
Challenges of AI Operationalization
The challenge of meeting these operationalization requirements is that the tools, data, and practices of the data science organization are not the same tools, data, and practices of the operationalization organization. These organizations often work with their own proprietary tools, such as data notebooks and data science-oriented tools on the data science side during the training phase, and runtime environments, big data infrastructure, and IDE-based development ecosystems on the operations side. There’s no easy bridging of these technologies and so what happens is a struggle to make the transition from the training phase in the laboratory to the inference phase in the real world.
Another challenge is that of interpretability / explainability. As we’ve discussed frequently, many ML algorithms, especially neural networks, can provide high accuracy at the expense of low explainability. That is to say that the model will provide results but with no clear explanation as to why or how it derived at those results. While this might be acceptable in the training phase, the line of business might not be willing to accept the determinations of a “black box” that is denying loan applications, classifying potentially fraudulent activities, or transcribing voice into speech without explainability. In this light, operationalization of ML models might also require a tandem process by which results can either be explained or there’s an ensemble model somewhere that provides a path of interpretability.
Yet another challenge has to do with data ownership. Many data science personnel feel content working with local data sets on their personal machines, but this doesn’t work well in an operational context. The result is that these organizations operate in “silos”, leading to significant inefficiencies and duplicated data. While some seek the “perfect platform” to bridge this silo, the truth is that no such perfect platform exists. Instead, the answer is optimized processes and procedures that focus not just on AI model development and training or on model operationalization, but also on the critical transition between these two phases.
Understanding all the complexity and planning for this is part of what makes for successful AI, ML, and cognitive technology projects. This is one of the core aspects of our Cognilytica AI & ML Training and Certification that helps organizations not only accomplish their cognitive technology project goals, but do them right and with the lowest risk possible.