Optimizing Random Forest using Genetic Algorithm for Heart Disease Classification

Abstract
Heart disease is a leading cause of death worldwide, and the need for effective predictive systems is a major source of the need to treat affected patients. This study aimed to determine how to improve the accuracy of Random Forest in predicting and classifying heart disease. The experiments performed in this study were designed to select the most optimal parameters using an RF optimization technique using GA. The Genetic Algorithm (GA) is used to optimize RF parameters to predict and classify heart disease. Optimization of the Random Forest parameter using a genetic algorithm is carried out by using the Random Forest parameter as input for the initial population in the Genetic Algorithm. The Random Forest parameter undergoes a series of processes from the Genetic Algorithm: Selection, Crossover Rate, and Mutation Rate. The chromosome that has survived the evolution of the Genetic Algorithm is the best population or best parameter Random Forest. The best parameters are stored in the hall of fame module in the DEAP library and used for the classification process in Random Forest. The optimized RF parameters are max_depth, max_features, n_estimator, min_sample_leaf, and min_sample_leaf. The experimental process performed in RF uses the default parameters, random search, and grid search. Overall, the accuracy obtained for each experiment is the default parameter 82.5%, random search 82%, and grid search 83%. The RF+GA performance is 85.83%; this result is affected by the GA parameters are generations, population, crossover, and mutation. This shows that the Genetic Algorithm can be used to optimize the parameters of Random Forest.