Dementia risk predictions from German claims data using methods of machine learning

Open Access

22 April 2022

journal article
research article
Published by Wiley in Alzheimer's & Dementia

Vol. 19 (2), 477-486
https://doi.org/10.1002/alz.12663

Abstract

Introduction We examined whether German claims data are suitable for dementia risk prediction, how machine learning (ML) compares to classical regression, and what the important predictors for dementia risk are. Methods We analyzed data from the largest German health insurance company, including 117,895 dementia-free people age 65+. Follow-up was 10 years. Predictors were: 23 age-related diseases, 212 medical prescriptions, 87 surgery codes, as well as age and sex. Statistical methods included logistic regression (LR), gradient boosting (GBM), and random forests (RFs). Results Discriminatory power was moderate for LR (C-statistic = 0.714; 95% confidence interval [CI] = 0.708-0.720) and GBM (C-statistic = 0.707; 95% CI = 0.700-0.713) and lower for RF (C-statistic = 0.636; 95% CI = 0.628-0.643). GBM had the best model calibration. We identified antipsychotic medications and cerebrovascular disease but also a less-established specific antibacterial medical prescription as important predictors. Discussion Our models from German claims data have acceptable accuracy and may provide cost-effective decision support for early dementia screening.

This publication has 57 references indexed in Scilit:

Current Developments in Dementia Risk Prediction Modelling: An Updated Systematic Review
PLOS ONE, 2015
Use of atypical antipsychotics in the elderly: a clinical review
Clinical Interventions in Aging, 2014
Late-life depression and risk of vascular dementia and Alzheimer's disease: systematic review and meta-analysis of community-based cohort studies
The British Journal of Psychiatry, 2013
Can we use the pharmacy data to estimate the prevalence of chronic conditions? a comparison of multiple data sources
BMC Public Health, 2011
Epidemiology of Alzheimer disease
Nature Reviews Neurology, 2011
The Epidemiology of Dementia Associated with Parkinson's Disease
Brain Pathology, 2010
Derivation and validation of QRISK, a new cardiovascular disease risk score for the United Kingdom: prospective open cohort study
BMJ, 2007
Boosting with early stopping: Convergence and consistency
The Annals of Statistics, 2005
Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author)
Statistical Science, 2001
Validation of Probabilistic Predictions
Medical Decision Making, 1993

Cited by 11 articles