Dementia risk predictions from German claims data using methods of machine learning

Abstract
Introduction We examined whether German claims data are suitable for dementia risk prediction, how machine learning (ML) compares to classical regression, and what the important predictors for dementia risk are. Methods We analyzed data from the largest German health insurance company, including 117,895 dementia-free people age 65+. Follow-up was 10 years. Predictors were: 23 age-related diseases, 212 medical prescriptions, 87 surgery codes, as well as age and sex. Statistical methods included logistic regression (LR), gradient boosting (GBM), and random forests (RFs). Results Discriminatory power was moderate for LR (C-statistic = 0.714; 95% confidence interval [CI] = 0.708-0.720) and GBM (C-statistic = 0.707; 95% CI = 0.700-0.713) and lower for RF (C-statistic = 0.636; 95% CI = 0.628-0.643). GBM had the best model calibration. We identified antipsychotic medications and cerebrovascular disease but also a less-established specific antibacterial medical prescription as important predictors. Discussion Our models from German claims data have acceptable accuracy and may provide cost-effective decision support for early dementia screening.