Machine Learning and Data Cleaning: Which Serves the Other?
Open Access
- 21 July 2022
- journal article
- research article
- Published by Association for Computing Machinery (ACM) in Journal of Data and Information Quality
- Vol. 14 (3), 1-11
- https://doi.org/10.1145/3506712
Abstract
The last few years witnessed significant advances in building automated or semi-automated data quality, data cleaning and data integration systems powered by machine learning (ML). In parallel, large deployment of ML systems in business, science, environment and various other areas started to realize the strong dependency on the quality of the input data to these ML models to get reliable predictions or insights. That dual relationship between ML and data cleaning has been addressed by many recent research works under terms such as "Data cleaning for ML" and "ML for automating data cleaning and data preparation". In this article, we highlight this symbiotic relationship between ML and data cleaning and discuss few challenges that require collaborative efforts of multiple research communities.Keywords
This publication has 45 references indexed in Scilit:
- Classification in the Presence of Label Noise: A SurveyIEEE Transactions on Neural Networks and Learning Systems, 2013
- Guided data repairProceedings of the VLDB Endowment, 2011
- Sampling the repairs of functional dependency violations under hard constraintsProceedings of the VLDB Endowment, 2010
- Towards certain fixes with editing rules and master dataProceedings of the VLDB Endowment, 2010
- Reasoning about record matching rulesProceedings of the VLDB Endowment, 2009
- Data fusionProceedings of the VLDB Endowment, 2009
- Discovering data quality rulesProceedings of the VLDB Endowment, 2008
- Complexity of Consistent Query Answering in Databases Under Cardinality-Based and Incremental Repair SemanticsLecture Notes in Computer Science, 2006
- Learning object identification rules for information integrationInformation Systems, 2001
- Tane: An Efficient Algorithm for Discovering Functional and Approximate DependenciesThe Computer Journal, 1999