The Steps for a Machine Learning Project

Download the Step by Step Checklist for AI Projects

Are you in charge of a machine learning project and don’t know where to start? Or maybe you know some of the tools and technologies for doing the machine learning bits, but don’t know the steps you should follow to optimize your chances of success.  Fortunately, we have answers for you that provide the steps of a machine learning project.

Follow an easily approachable, step-by-step methodology for running AI and machine learning projects that is based on real-world experiences and best practices. Why make mistakes when others have made them for you? In this article, we share with you the best-practice approach for machine learning projects that is adopted by the leading machine learning practitioners from large and small organizations, governmental agencies, academic institutions and more. 

If you’re looking for an AI, machine learning, or even a data science project management framework this approach is for you. If you manage ML and AI project teams, then this approach combined with more focused project management training will get you on your way!

How do you develop an ML project? 

If you’ve never run or managed a machine learning project before, you should know that machine learning projects are really data projects. If you’ve already run multiple ML projects then you know that the primary challenges have nothing to do with learning tools or implementing Python code. Rather, the biggest problems come from wrangling with data that never seems to be in the right shape, cleanliness, or availability. 

Fortunately, a good approach exists for running ML projects that deal with this reality of data, and adopting these steps helps even experienced ML project managers and anyone else who has run their AI and ML projects. The challenge is that you can’t apply application development-centric approaches or an agile methodology for machine learning directly to ML and AI projects because ML projects aren’t application development projects. AI and machine learning projects are not built upon code and functionality. They are built upon data. Therefore ML projects need to be run like data projects. However, we can’t afford long cycles for ML project development that take many months. We want the benefits of agile but adopting a data-centric approach. So, if you are supposed to run AI projects more like data projects, but also be agile and iterative, what are the steps for ML project management to making this work?

How to approach a machine learning project

Fortunately there is a robust, iterative and agile approach that provides the steps for a machine learning project in a reliable, reproducible manner with a high degree of success. This approach is the Cognitive Project Management for AI (CPMAI) methodology, which has been successfully implemented and adopted by private and public sector organizations. This methodology, built upon decades-old data centric CRISP-DM, also incorporates an agile approach to provide for short, iterative sprints for projects.

There are six steps for a machine learning project, also referred to as phases, in the CPMAI methodology. You can see a high level overview of the six primary CPMAI Phases and their objectives below that is used to guide  ML project structure and planning:

The Steps for a Machine Learning project follow six main phases, from the CPMAI methodology

Source: Cognilytica

The six CPMAI steps, or phases, include:

  • CPMAI Phase I: Business Understanding – “Mapping the business problem to the AI solution.”
  • CPMAI Phase II: Data Understanding – “Getting a hold of the right data to address the problem.”
  • CPMAI Phase III: Data Preparation – “Getting the data ready for use in a data-centric AI Project.”
  • CPMAI Phase IV: Model Development – “Producing an AI solution that addresses the business problem.”
  • CPMAI Phase V: Model Evaluation – “Determining whether the AI solution meets the real-world and business needs.”
  • CPMAI Phase VI: Model Operationalization – “Putting the AI solution to use in the real-world, and iterating to continue its delivery of value:”

Machine Learning Project Plan

Following the CPMAI approach provides a Machine learning project plan template providing the steps for a machine learning project we can follow for project success. At a high level these steps are identified in further detail below:

First step – Phase I: Business Understanding

The first step for any AI project should always be gathering an understanding of the business requirements and understanding the business needs. After all, if you’re not solving a real business problem for your organization then why are you even doing the project at all? In phase one of your project iteration you should focus on understanding the project objectives and requirements from a business perspective, then converting this knowledge into an AI and ML problem definition and a preliminary plan designed to achieve the objectives. 

It’s important to remember that AI is not the solution to every problem. During this first phase you should determine if AI is in fact the right solution to your problem. If it is, then figure out what portions of the project require AI and ML. Once determined, you can next outline your criteria for project success, understand what data-centric problem you’re solving, determine what skill sets are going to be needed for successful project completion, KPIs, and other critical factors for business success.

Second step – Phase II: Data Understanding

The second step for any AI project is understanding your data. The most important part here is understanding what data is required to address the business problem, whether or not that data is available, and what format(s) your data is in. Data is what fuels your AI projects so you should make sure you have a firm understanding of your data before getting too far along in your project.

In the Data Understanding step, you should look to address three key data requirements for AI projects: 

  1. Availability and sources of data to meet business needs
  2. Quality of that data and need for enhancement or augmentation
  3. Environments in which data is needed for training and real-world inference 

Determining what data is necessary to achieve the business objectives you laid out in step one, determine the quantity and quality of our data, what (if any) external data will be needed, as well as addressing ongoing data gathering and preparation.

Since AI projects are heavily data projects, a firm understanding of the data environment and availability of data is absolutely essential to AI project success. The number of AI projects that fail due to a lack of understanding of data availability, quality, or other related factors is significant. Don’t skip or shortcut this step!

Third step – Phase III: Data Preparation

The third step in an AI project is Data Preparation. Once you have figured out what problem you are solving and what data you have, next you need to make sure the data you have is usable for your project. In this step you need to do tasks such as data cleansing, data aggregation, data augmentation, data labeling, data normalization, data transformation and any other activities for data of structured, unstructured, and semi-structured nature. 

Remember that “garbage in is garbage out”. Don’t feed your system with trash data and then expect anything less than trash results. This step address three key data preparation requirements for AI projects: wrangling data from the sources and transforming it to its required state, cleansing the data to eliminate critical data flaws, and augmenting the data and enhancing as needed including data labeling to add necessary meaning and context to the data so that AI systems can properly learn from the data. In this step you should be addressing how your data needs be transformed to meet your project requirements, means by which data quality can continuously be monitored and evaluated, if and how you will be using and modifying third-party data, data labeling requirements, as well as creating your data engineering pipelines.

Fourth step – Phase IV: Model Development

The fourth step of your AI project should be the creation and development of machine learning models. This includes model technique selection and application, model training, model hyperparameter setting and adjustment, model validation, ensemble model development and testing, algorithm selection, and model optimization. By the time you are ready to build your very first model you’ve already determined the business needs, the data requirements, and gotten the data in the right format and quality. If you haven’t, then you need to revisit these steps before moving forward.

In the Model Development phase you need to determine appropriate algorithm selection, settings, and hyperparameters. It’s also important to determine if you will use third-party models and/or extend those models. You also need to determine the performance of model training and model optimization activities, matching the model performance against business requirements, as well as selecting the appropriate infrastructure for model training.

The first iterations of models should be quick and short enough so that a model is produced within the first week or two of the project iteration. Remember, when taking an agile approach you want to be able to do things in short, iterative sprints. This should not be a months-long process; after all this is not a waterfall approach to creating your model. 

Fifth Step – Phase V: Model Evaluation

Once a model is created, it needs to be evaluated to make sure it performs according to the business requirements and other factors set in the previous steps of a machine learning project. In this fifth phase of your AI project you are now ready for model evaluation. From an AI perspective this includes model metric evaluation, model precision and accuracy, determination of false positive and negative rates, key performance indicator metrics, model performance metrics, model quality measurements, and a determination as to whether or not the model is suitable for meeting the goals or whether earlier phases should be iterated upon to reach those goals.

In this step you should be evaluating and testing your model, evaluating model performance measurement and improvement, and determining needs for ongoing model iteration. You should also be determining if the model meets requirements for accuracy, precision, and other key metrics, evaluating concerns on overfit and underfit of models, evaluating your models against business Key Performance Indicators (KPIs) as well as determining means for model monitoring, iteration and versioning.

Just like QA and testing in the non-AI world, model evaluation is crucial to making sure that the AI solution meets your business needs. Far too many organizations are short-changing their model evaluation step which causes models to fail. Don’t skip this step – evaluate your model!

Final Step – Phase VI: Model Operationalization

The last step of each AI project lifecycle iteration is putting the model you just created into operation. In step 6 of your AI project make sure to address model versioning and iteration, model deployment, model monitoring, model staging in development and production environments, and other aspects of getting the model in a position to provide value to meet the stated purpose. 

The key needs to address during Phase 6 Model Operationalization include model deployment, model management, and model governance. Ask how this model be used in production / operational environments, determine the requirements for data flow for a model to be useful, set requirements for performance, and determine ongoing iteration requirements. 

Putting the steps for a machine learning project  into action

Using the above approach, you can complete one cycle of your machine learning project. However, ML projects are never “set it and forget it” activities. You need to continue to evaluate model performance and repeat the phases of the CPMAI methodology periodically to continue AI project success. For data-driven projects, especially AI projects, real-world data continues to change which means your model performance will change over time as well. Organizations that haven’t budgeted for model monitoring and model maintenance will realize quickly that unless they devote resources to the model once in production it will decay over time and become unusable.

The steps outlined above will provide you a great starting point for your AI projects. For more detail, download this step-by-step approach checklist to use as a handy template for every ML and AI project you plan to run.

Download the Step by Step Checklist for AI Projects

Login Or Register

cropped-CogHeadLogo.png

AI Best Practices

Get the Step By Step Checklist for AI Projects

login

Login to register for events. Don’t have an account? Just register for an event and an account will be created for you!