Machine Learning and Data Cleaning: Which Serves the Other?

Abstract
The last few years witnessed significant advances in building automated or semi-automated data quality, data cleaning and data integration systems powered by machine learning (ML). In parallel, large deployment of ML systems in business, science, environment and various other areas started to realize the strong dependency on the quality of the input data to these ML models to get reliable predictions or insights. That dual relationship between ML and data cleaning has been addressed by many recent research works under terms such as "Data cleaning for ML" and "ML for automating data cleaning and data preparation". In this article, we highlight this symbiotic relationship between ML and data cleaning and discuss few challenges that require collaborative efforts of multiple research communities.

This publication has 45 references indexed in Scilit: