Using random-forest multiple imputation to address bias of self-reported anthropometric measures, hypertension and hypercholesterolemia in the Belgian health interview survey
Open Access
- 25 March 2023
- journal article
- research article
- Published by Springer Science and Business Media LLC in BMC Medical Research Methodology
- Vol. 23 (1), 1-15
- https://doi.org/10.1186/s12874-023-01892-x
Abstract
In many countries, the prevalence of non-communicable diseases risk factors is commonly assessed through self-reported information from health interview surveys. It has been shown, however, that self-reported instead of objective data lead to an underestimation of the prevalence of obesity, hypertension and hypercholesterolemia. This study aimed to assess the agreement between self-reported and measured height, weight, hypertension and hypercholesterolemia and to identify an adequate approach for valid measurement error correction. Nine thousand four hundred thirty-nine participants of the 2018 Belgian health interview survey (BHIS) older than 18 years, of which 1184 participated in the 2018 Belgian health examination survey (BELHES), were included in the analysis. Regression calibration was compared with multiple imputation by chained equations based on parametric and non-parametric techniques. This study confirmed the underestimation of risk factor prevalence based on self-reported data. With both regression calibration and multiple imputation, adjusted estimation of these variables in the BHIS allowed to generate national prevalence estimates that were closer to their BELHES clinical counterparts. For overweight, obesity and hypertension, all methods provided smaller standard errors than those obtained with clinical data. However, for hypercholesterolemia, for which the regression model’s accuracy was poor, multiple imputation was the only approach which provided smaller standard errors than those based on clinical data. The random-forest multiple imputation proves to be the method of choice to correct the bias related to self-reported data in the BHIS. This method is particularly useful to enable improved secondary analysis of self-reported data by using information included in the BELHES. Whenever feasible, combined information from HIS and objective measurements should be used in risk factor monitoring.Keywords
This publication has 55 references indexed in Scilit:
- The influence of measurement error on calibration, discrimination, and overall estimation of a risk prediction modelPopulation Health Metrics, 2012
- Validity and predictors of BMI derived from self-reported height and weight among 11- to 17-year-old German adolescents from the KiGGS studyBMC Research Notes, 2011
- Validity of self‐reported height and weight and derived body mass index in middle‐aged and elderly individuals in AustraliaAustralian and New Zealand Journal of Public Health, 2011
- An introduction to recursive partitioning: Rationale, application, and characteristics of classification and regression trees, bagging, and random forests.Psychological Methods, 2009
- Accuracy and usefulness of BMI measures based on self-reported weight and height: findings from the NHANES & NHIS 2001-2006BMC Public Health, 2009
- Validity of Self-Reported Weight and Height of Adolescents, Its Impact on Classification into BMI-Categories and the Association with Weighing BehaviourInternational Journal of Environmental Research and Public Health, 2009
- Exposure-measurement error is frequently ignored when interpreting epidemiologic study resultsEuropean Journal of Epidemiology, 2006
- Commentary: Dealing with measurement error: multiple imputation or regression calibration?International Journal of Epidemiology, 2006
- Prediction Equations Do Not Eliminate Systematic Error in Self‐Reported Body Mass IndexObesity Research, 1997
- STATISTICAL METHODS FOR ASSESSING AGREEMENT BETWEEN TWO METHODS OF CLINICAL MEASUREMENTThe Lancet, 1986