How to apply CRISP-DM to AI and big data projects
Join Thousands of Others Who are Certified in AI Best-Practices
Those who run AI and machine learning projects know that they are data projects more than they are application development projects. After all it’s the data that AI and machine learning needs to be trained on and “learn” from. The primary challenges of making AI and machine learning projects have nothing to do with learning tools or implementing Python code. Rather, the biggest problems come from issues related to data. Organizations have sought approaches to apply best practices to these data-centric AI projects, and the well-known CRISP-DM approach is one such method. So, how can you apply CRISP-DM to AI and big data projects?
Therefore, if AI and Big Data projects are focused so heavily on data, it would make sense to use data-centric step by step approaches to managing these projects.
An Overview of CRISP-DM
Before this most recent wave of AI and machine learning interest and hype which started in the 2010s, organizations with data-centric project needs adopted methodologies that focused on their data-centric needs. With areas of focus in data mining and data analytics, it made sense that methodologies focused on these topics popped up. Project owners needed an approach to deal with areas related to data discovery, data preparation, modeling, evaluation, and delivery.
Responding to the needs of project owners and leaders for a more iterative approach to data mining and analytics, before the concept of big data agile methodology even was a thing, a consortium of five vendors developed the Cross-industry Standard Process for Data Mining (CRISP-DM) focused on a continuous iteration approach to the various data intensive steps in a data mining project. Specifically, the CRISP-DM methodology starts with an iterative loop between business understanding and data understanding, and then a handoff to an iterative loop between data preparation and data modeling, which then gets passed to an evaluation phase, which splits its results to deployment and back to the business understanding. The whole approach is developed in a cyclic iterative loop, which leads to continuous data modeling, preparation, and evaluation.
What are the 6 steps in CRISP-DM?
There are six major steps in the CRISP-DM methodology.
- Business Understanding
The first step is business understanding. This focuses on understanding the project objectives, requirements, and goals from a business perspective. After all, if you’re not solving a real business problem then why are you doing the project at all.
- Data Understanding
The second step is data understanding. Once you know what problem you’re solving, you need to make sure you have the data needed to apply to your project. This step includes your initial data collection needs, identifying potential data quality problems, and understanding what data is still needed.
- Data Preparation
The third step is data preparation. Once you have the data needed, you need to prepare this data in order to use it for your project.
The fourth step is modeling. In this step you’ll need to evaluate, selects & apply the appropriate modeling techniques.
The fifth step is the CRISP-DM evaluation phase. You need to make sure that you’re getting the results you seek from your data and model. In this step you want to test your model to make sure it’s behaving the way you were expecting.
The sixth step is the CRISP-DM deployment phase. This means once you’re ready to put your model into the real world, you need to make sure it’s behaving the way you are expecting, especially in regards to new real-world data that it’s never seen before.
The image below provides a visual overview of the six major steps in the CRISP-DM methodology and how they are related.
Source: CRISP-DM 1.0
So given the above, do we have enough information to know how to apply CRISP-DM to AI and big data projects?
Learning how to apply CRISP-DM to AI and big data projects
While CRISP-DM has been around now for decades, there has been no further development of the CRISP-DM methodology since its initial release. About fifteen years ago there were rumors that a second version was underway but nothing was ever published. Before there was “big data”, CRISP-DM provided the big data project methodology people tried to implement, but realized that without considerations or the reality of modern big data, CRISP-DM was challenged.
Also, from the image above, you’ll see that the application of CRISP-DM presents some challenges to the way modern organizations operate. A main challenge to making CRISP-DM work is in the context of existing Agile methodologies. From the perspective of Agile, the entire CRISP-DM loop is contained within the development and deployment spheres. It also touches upon the business requirements and testing portions of the Agile loop as well. Indeed, if we bring Agile into the picture, these two independent cycles of application-focused agile AI development and data-focused data methodologies are intertwined in complex ways.
Another CRISP-DM disadvantage is there is formal CRISP-DM certification. This means that AI project management, big data project management, and other project leaders are left to figure this out on their own. When creating a big data project plan, these managers are left to figure out if and how CRISP-DM is supposed to fit into their plan. The lack of support, continued improvement and iteration, and lack of ownership proves challenging with this methodology.
CPMAI: Best Practices approach to Manage AI and Data Projects
So, if we can’t use CRISP-DM to manage AI and data projects, then what can we use? The answer is a methodology that starts from the same root of business requirements as CRISP-DM and splits into two simultaneous iterative loops of Agile project development and Agile-enabled data methodologies. We can think of this as an Agile and iterative data-centric project management approach. This step by step approach is the Cognitive Project Management for AI (CPMAI) methodology.
The CPMAI methodology is a vendor-neutral, data-centric, AI-specific, iterative methodology for running and managing AI, ML, and advanced data projects. Far too often, organizations want to be iterative and agile but deploy a waterfall model for machine learning. CPMAI makes needed enhancements to Agile and CRISP-DM methodologies to meet AI-specific requirements that allow you to develop a model quickly. Indeed, CPMAI is better suited to be the big data project management methodology that CRISP-DM didn’t really live up to.
The mantra for CPMAI is “Think Big. Start Small. Iterate Often.” By extending and enhancing these current methodologies rather than creating a new approach, CPMAI can be implemented immediately at organizations with already-running Agile teams and already-running data projects.
As we have seen at many organizations, introducing something new and foreign can create instant resistance. So the key is to provide a blended approach that simultaneously delivers the expected results to the organization, using familiar and approachable terminology and concepts, and provides an approach for continued iterative development at the lowest risk possible. At the end of the day, successfully running and managing an AI project with an appropriate AI project methodology should be everyone’s goal. CPMAI training and certification provide the necessary details to know how to apply CRISP-DM to AI and big data projects.Interested in learning more about CPMAI? Sign up for CPMAI training and certification and get the project management training needed to successfully run AI and big data projects.