Dementia risk predictions from German claims data using methods of machine learning
Open Access
- 22 April 2022
- journal article
- research article
- Published by Wiley in Alzheimer's & Dementia
- Vol. 19 (2), 477-486
- https://doi.org/10.1002/alz.12663
Abstract
Introduction We examined whether German claims data are suitable for dementia risk prediction, how machine learning (ML) compares to classical regression, and what the important predictors for dementia risk are. Methods We analyzed data from the largest German health insurance company, including 117,895 dementia-free people age 65+. Follow-up was 10 years. Predictors were: 23 age-related diseases, 212 medical prescriptions, 87 surgery codes, as well as age and sex. Statistical methods included logistic regression (LR), gradient boosting (GBM), and random forests (RFs). Results Discriminatory power was moderate for LR (C-statistic = 0.714; 95% confidence interval [CI] = 0.708-0.720) and GBM (C-statistic = 0.707; 95% CI = 0.700-0.713) and lower for RF (C-statistic = 0.636; 95% CI = 0.628-0.643). GBM had the best model calibration. We identified antipsychotic medications and cerebrovascular disease but also a less-established specific antibacterial medical prescription as important predictors. Discussion Our models from German claims data have acceptable accuracy and may provide cost-effective decision support for early dementia screening.This publication has 57 references indexed in Scilit:
- Current Developments in Dementia Risk Prediction Modelling: An Updated Systematic ReviewPLOS ONE, 2015
- Use of atypical antipsychotics in the elderly: a clinical reviewClinical Interventions in Aging, 2014
- Late-life depression and risk of vascular dementia and Alzheimer's disease: systematic review and meta-analysis of community-based cohort studiesThe British Journal of Psychiatry, 2013
- Can we use the pharmacy data to estimate the prevalence of chronic conditions? a comparison of multiple data sourcesBMC Public Health, 2011
- Epidemiology of Alzheimer diseaseNature Reviews Neurology, 2011
- The Epidemiology of Dementia Associated with Parkinson's DiseaseBrain Pathology, 2010
- Derivation and validation of QRISK, a new cardiovascular disease risk score for the United Kingdom: prospective open cohort studyBMJ, 2007
- Boosting with early stopping: Convergence and consistencyThe Annals of Statistics, 2005
- Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author)Statistical Science, 2001
- Validation of Probabilistic PredictionsMedical Decision Making, 1993