CRISP-DM Twenty Years Later: From Data Mining Processes to Data Science Trajectories
Top Cited Papers
- 27 December 2019
- journal article
- research article
- Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Knowledge and Data Engineering
- Vol. 33 (8), 3048-3061
- https://doi.org/10.1109/tkde.2019.2962680
Abstract
CRISP-DM (CRoss-Industry Standard Process for Data Mining) has its origins in the second half of the nineties and is thus about two decades old. According to many surveys and user polls it is still thede factostandard for developing data mining and knowledge discovery projects. However, undoubtedly the field has moved on considerably in twenty years, with data science now the leading term being favoured over data mining. In this paper we investigate whether, and in what contexts, CRISP-DM is still fit for purpose for data science projects. We argue that if the project is goal-directed and process-driven the process model view still largely holds. On the other hand, when data science projects become more exploratory the paths that the project can take become more varied, and a more flexible model is called for. We suggest what the outlines of such a trajectory-based model might look like and how it can be used to categorise data science projects (goal-directed, exploratory or data management). We examine seven real-life exemplars where exploratory activities play an important role and compare them against 51 use cases extracted from the NIST Big Data Public Working Group. We anticipate this categorisation can help project planning in terms of time and cost characteristics.Keywords
Funding Information
- eu
- Spanish MINECO (RTI2018-094403-B-C3)
- Generalitat Valenciana (PROMETEO/2019/098)
- Instituto Nacional de Ciberseguridad
- European Commission ((CT-EX2018D335821-101), UPV (PAID-06-18))
- FLI (RFP2-152)
This publication has 35 references indexed in Scilit:
- Data science and predictionCommunications of the ACM, 2013
- An overview of business intelligence technologyCommunications of the ACM, 2011
- A survey of data mining and knowledge discovery process models and methodologiesThe Knowledge Engineering Review, 2010
- Defining the scientific methodNature Methods, 2009
- Toward data mining engineering: A software engineering approachInformation Systems, 2009
- Visual Analytics: Definition, Process, and ChallengesLecture Notes in Computer Science, 2008
- A survey of Knowledge Discovery and Data Mining process modelsThe Knowledge Engineering Review, 2006
- Knowledge discovery from industrial databasesJournal of Intelligent Manufacturing, 2004
- Building the KDD RoadmapPublished by Springer Science and Business Media LLC ,2001
- The KDD process for extracting useful knowledge from volumes of dataCommunications of the ACM, 1996