Missing data handling for machine learning models
Published: 1 June 2021
IAES International Journal of Robotics and Automation (IJRA) , Volume 10; https://doi.org/10.11591/ijra.v10i2.pp123-132
Abstract: This paper discusses a novel algorithm for solving a missing data problem in the machine learning pre-processing stage. A model built to help lenders evaluate home loans based on numerous factors by learning from available user data, is adopted in this paper as an example. If one of the factors is missing for a person in the dataset, the currently used methods delete the whole entry therefore reducing the size of the dataset and affecting the machine learning model accuracy. The novel algorithm aims to avoid losing entries for missing factors by breaking the dataset into multiple subsets, building a different machine learning model for each subset, then combining the models into one machine learning model. In this manner, the model makes use of all available data and only neglects the missing values. Overall, the new algorithm improved the prediction accuracy by 5% from 93% accuracy to 98% in the home loan example.
Keywords: machine learning model / missing / building / loan / algorithm / home
Scifeed alert for new publicationsNever miss any articles matching your research from any publisher
- Get alerts for new papers matching your research
- Find out the new papers from selected authors
- Updated daily for 49'000+ journals and 6000+ publishers
- Define your Scifeed now
Click here to see the statistics on "IAES International Journal of Robotics and Automation (IJRA)" .