Post-Analysis of Predictive Modeling with an Epidemiological Example
Open Access
- 24 June 2021
- journal article
- research article
- Published by MDPI AG in Healthcare
- Vol. 9 (7), 792
- https://doi.org/10.3390/healthcare9070792
Abstract
Post-analysis of predictive models fosters their application in practice, as domain experts want to understand the logic behind them. In epidemiology, methods explaining sophisticated models facilitate the usage of up-to-date tools, especially in the high-dimensional predictor space. Investigating how model performance varies for subjects with different conditions is one of the important parts of post-analysis. This paper presents a model-independent approach for post-analysis, aiming to reveal those subjects’ conditions that lead to low or high model performance, compared to the average level on the whole sample. Conditions of interest are presented in the form of rules generated by a multi-objective evolutionary algorithm (MOGA). In this study, Lasso logistic regression (LLR) was trained to predict cardiovascular death by 2016 using the data from the 1984–1989 examination within the Kuopio Ischemic Heart Disease Risk Factor Study (KIHD), which contained 2682 subjects and 950 preselected predictors. After 50 independent runs of five-fold cross-validation, the model performance collected for each subject was used to generate rules describing “easy” and “difficult” cases. LLR with 61 selected predictors, on average, achieved 72.53% accuracy on the whole sample. However, during post-analysis, three categories of subjects were discovered: “Easy” cases with an LLR accuracy of 95.84%, “difficult” cases with an LLR accuracy of 48.11%, and the remaining cases with an LLR accuracy of 71.00%. Moreover, the rule analysis showed that medication was one of the main confusing factors that led to lower model performance. The proposed approach provides insightful information about subjects’ conditions that complicate predictive modeling.Funding Information
- Itä-Suomen Yliopisto (An early stage researcher position for Christina Brester)
This publication has 23 references indexed in Scilit:
- Multiobjective grammar-based genetic programming applied to the study of asthma and allergy epidemiologyBMC Bioinformatics, 2018
- What This Computer Needs Is a PhysicianJAMA, 2018
- Treatment use in prognostic model research: a systematic review of cardiovascular prognostic studies.Diagnostic and Prognostic Research, 2017
- Development and Validation of a Deep Learning Algorithm for Detection of Diabetic Retinopathy in Retinal Fundus PhotographsJAMA, 2016
- Computational Complexity Measures for Many-objective Optimization ProblemsProcedia Computer Science, 2014
- An Evolutionary Many-Objective Optimization Algorithm Using Reference-Point-Based Nondominated Sorting Approach, Part I: Solving Problems With Box ConstraintsIEEE Transactions on Evolutionary Computation, 2013
- Kuopio Ischemic Heart Disease Risk Factor StudyPublished by Springer Science and Business Media LLC ,2013
- LIBSVMACM Transactions on Intelligent Systems and Technology, 2011
- Missing value estimation methods for DNA microarraysBioinformatics, 2001
- Is there a continuing need for longitudinal epidemiologic research? The Kuopio Ischaemic Heart Disease Risk Factor Study.1988