An evolutionary machine learning algorithm for cardiovascular disease risk prediction

Abstract
This study developed a novel risk assessment model to predict the occurrence of cardiovascular disease (CVD) events. It uses a Genetic Algorithm (GA) to develop an easy-to-use model with high accuracy, calibrated based on the Isfahan Cohort Study (ICS) database. The ICS was a population-based prospective cohort study of 6,504 healthy Iranian adults aged ≥ 35 years followed for incident CVD over ten years, from 2001 to 2010. To develop a risk score, the problem of predicting CVD was solved using a well-designed GA, and finally, the results were compared with classic machine learning (ML) and statistical methods. A number of risk scores such as the WHO, and PARS models were utilized as the baseline for comparison due to their similar chart-based models. The Framingham and PROCAM models were also applied to the dataset, with the area under a Receiver Operating Characteristic curve (AUROC) equal to 0.633 and 0.683, respectively. However, the more complex Deep Learning model using a three-layered Convolutional Neural Network (CNN) performed best among the ML models, with an AUROC of 0.74, and the GA-based eXplanaible Persian Atherosclerotic CVD Risk Stratification (XPARS) showed higher performance compared to the statistical methods. XPARS with eight features showed an AUROC of 0.76, and the XPARS with four features, showed an AUROC of 0.72. A risk model that is extracted using GA substantially improves the prediction of CVD compared to conventional methods. It is clear, interpretable and can be a suitable replacement for conventional statistical methods.