New Search

Export article
Open Access

Missing data handling for machine learning models

Karim H. Erian, Pedro H. Regalado, James M. Conrad

Abstract: This paper discusses a novel algorithm for solving a missing data problem in the machine learning pre-processing stage. A model built to help lenders evaluate home loans based on numerous factors by learning from available user data, is adopted in this paper as an example. If one of the factors is missing for a person in the dataset, the currently used methods delete the whole entry therefore reducing the size of the dataset and affecting the machine learning model accuracy. The novel algorithm aims to avoid losing entries for missing factors by breaking the dataset into multiple subsets, building a different machine learning model for each subset, then combining the models into one machine learning model. In this manner, the model makes use of all available data and only neglects the missing values. Overall, the new algorithm improved the prediction accuracy by 5% from 93% accuracy to 98% in the home loan example.
Keywords: machine learning model / missing / building / loan / algorithm / home

Scifeed alert for new publications

Never miss any articles matching your research from any publisher
  • Get alerts for new papers matching your research
  • Find out the new papers from selected authors
  • Updated daily for 49'000+ journals and 6000+ publishers
  • Define your Scifeed now

Share this article

Click here to see the statistics on "IAES International Journal of Robotics and Automation (IJRA)" .
Back to Top Top