The performance of inverse probability of treatment weighting and full matching on the propensity score in the presence of model misspecification when estimating the effect of treatment on survival outcomes

Open Access

30 April 2015

journal article
Published by SAGE Publications in Statistical Methods in Medical Research

Vol. 26 (4), 1654-1670
https://doi.org/10.1177/0962280215584401

Abstract

There is increasing interest in estimating the causal effects of treatments using observational data. Propensity-score matching methods are frequently used to adjust for differences in observed characteristics between treated and control individuals in observational studies. Survival or time-to-event outcomes occur frequently in the medical literature, but the use of propensity score methods in survival analysis has not been thoroughly investigated. This paper compares two approaches for estimating the Average Treatment Effect (ATE) on survival outcomes: Inverse Probability of Treatment Weighting (IPTW) and full matching. The performance of these methods was compared in an extensive set of simulations that varied the extent of confounding and the amount of misspecification of the propensity score model. We found that both IPTW and full matching resulted in estimation of marginal hazard ratios with negligible bias when the ATE was the target estimand and the treatment-selection process was weak to moderate. However, when the treatment-selection process was strong, both methods resulted in biased estimation of the true marginal hazard ratio, even when the propensity score model was correctly specified. When the propensity score model was correctly specified, bias tended to be lower for full matching than for IPTW. The reasons for these biases and for the differences between the two methods appeared to be due to some extreme weights generated for each method. Both methods tended to produce more extreme weights as the magnitude of the effects of covariates on treatment selection increased. Furthermore, more extreme weights were observed for IPTW than for full matching. However, the poorer performance of both methods in the presence of a strong treatment-selection process was mitigated by the use of IPTW with restriction and full matching with a caliper restriction when the propensity score model was correctly specified.

Keywords

This publication has 38 references indexed in Scilit:

The performance of different propensity score methods for estimating marginal hazard ratios
Statistics in Medicine, 2012
Using Ensemble-Based Methods for Directly Estimating Causal Effects: An Investigation of Tree-Based G-Computation
Multivariate Behavioral Research, 2012
An Introduction to Propensity Score Methods for Reducing the Effects of Confounding in Observational Studies
Multivariate Behavioral Research, 2011
Optimal caliper widths for propensity‐score matching when estimating differences in means and differences in proportions in observational studies
Pharmaceutical Statistics, 2011
Statistical Criteria for Selecting the Optimal Number of Untreated Subjects Matched to Each Treated Subject When Using Many-to-One Matching on the Propensity Score
American Journal of Epidemiology, 2010
Improving propensity score weighting using machine learning
Statistics in Medicine, 2009
Evaluating uses of data mining techniques in propensity score estimation: a simulation study
Pharmacoepidemiology and Drug Safety, 2008
A critical appraisal of propensity‐score matching in the medical literature between 1996 and 2003
Statistics in Medicine, 2007
Generating survival times to simulate Cox proportional hazards models
Statistics in Medicine, 2005
The Robust Inference for the Cox Proportional Hazards Model
Journal of the American Statistical Association, 1989

Cited by 201 articles