Missing Data Approaches in eHealth Research: Simulation Study and a Tutorial for Nonmathematically Inclined Researchers

Open Access

19 December 2010

journal article
Published by JMIR Publications Inc. in Journal of Medical Internet Research

Vol. 12 (5), e54
https://doi.org/10.2196/jmir.1448

Abstract

Background: Missing data is a common nuisance in eHealth research: it is hard to prevent and may invalidate research findings. Objective: In this paper several statistical approaches to data “missingness” are discussed and tested in a simulation study. Basic approaches (complete case analysis, mean imputation, and last observation carried forward) and advanced methods (expectation maximization, regression imputation, and multiple imputation) are included in this analysis, and strengths and weaknesses are discussed. Methods: The dataset used for the simulation was obtained from a prospective cohort study following participants in an online self-help program for problem drinkers. It contained 124 nonnormally distributed endpoints, that is, daily alcohol consumption counts of the study respondents. Missingness at random (MAR) was induced in a selected variable for 50% of the cases. Validity, reliability, and coverage of the estimates obtained using the different imputation methods were calculated by performing a bootstrapping simulation study. Results: In the performed simulation study, the use of multiple imputation techniques led to accurate results. Differences were found between the 4 tested multiple imputation programs: NORM, MICE, Amelia II, and SPSS MI. Among the tested approaches, Amelia II outperformed the others, led to the smallest deviation from the reference value (Cohen’s d = 0.06), and had the largest coverage percentage of the reference confidence interval (96%). Conclusions: The use of multiple imputation improves the validity of the results when analyzing datasets with missing observations. Some of the often-used approaches (LOCF, complete cases analysis) did not perform well, and, hence, we recommend not using these. Accumulating support for the analysis of multiple imputed datasets is seen in more recent versions of some of the widely used statistical software programs making the use of multiple imputation more readily available to less mathematically inclined researchers. [J Med Internet Res 2010;12(5):e54]

This publication has 32 references indexed in Scilit:

Combining estimates of interest in prognostic modelling studies after multiple imputation: current practice and guidelines
BMC Medical Research Methodology, 2009
Adherence in Internet Interventions for Anxiety and Depression
Journal of Medical Internet Research, 2009
Evaluating real-time internet therapy and online self-help for problematic alcohol consumers: a three-arm RCT protocol
BMC Public Health, 2009
Missing Data Analysis: Making It Work in the Real World
Annual Review of Psychology, 2009
The Law of Attrition
Journal of Medical Internet Research, 2005
Analyzing incomplete longitudinal clinical trial data
Biostatistics, 2004
Multivariate modeling of missing data within and across assessment waves
Addiction, 2000
Multiple Imputation After 18+ Years
Journal of the American Statistical Association, 1996
Panel attrition and external validity in adolescent substance use research.
Journal of Consulting and Clinical Psychology, 1992
Two-Year Follow-up of a Social-Cognitive Intervention to Prevent Substance Use
Journal of Drug Education, 1992

Cited by 155 articles