The performance of inverse probability of treatment weighting and full matching on the propensity score in the presence of model misspecification when estimating the effect of treatment on survival outcomes
Open Access
- 30 April 2015
- journal article
- Published by SAGE Publications in Statistical Methods in Medical Research
- Vol. 26 (4), 1654-1670
- https://doi.org/10.1177/0962280215584401
Abstract
There is increasing interest in estimating the causal effects of treatments using observational data. Propensity-score matching methods are frequently used to adjust for differences in observed characteristics between treated and control individuals in observational studies. Survival or time-to-event outcomes occur frequently in the medical literature, but the use of propensity score methods in survival analysis has not been thoroughly investigated. This paper compares two approaches for estimating the Average Treatment Effect (ATE) on survival outcomes: Inverse Probability of Treatment Weighting (IPTW) and full matching. The performance of these methods was compared in an extensive set of simulations that varied the extent of confounding and the amount of misspecification of the propensity score model. We found that both IPTW and full matching resulted in estimation of marginal hazard ratios with negligible bias when the ATE was the target estimand and the treatment-selection process was weak to moderate. However, when the treatment-selection process was strong, both methods resulted in biased estimation of the true marginal hazard ratio, even when the propensity score model was correctly specified. When the propensity score model was correctly specified, bias tended to be lower for full matching than for IPTW. The reasons for these biases and for the differences between the two methods appeared to be due to some extreme weights generated for each method. Both methods tended to produce more extreme weights as the magnitude of the effects of covariates on treatment selection increased. Furthermore, more extreme weights were observed for IPTW than for full matching. However, the poorer performance of both methods in the presence of a strong treatment-selection process was mitigated by the use of IPTW with restriction and full matching with a caliper restriction when the propensity score model was correctly specified.Keywords
This publication has 38 references indexed in Scilit:
- The performance of different propensity score methods for estimating marginal hazard ratiosStatistics in Medicine, 2012
- Using Ensemble-Based Methods for Directly Estimating Causal Effects: An Investigation of Tree-Based G-ComputationMultivariate Behavioral Research, 2012
- An Introduction to Propensity Score Methods for Reducing the Effects of Confounding in Observational StudiesMultivariate Behavioral Research, 2011
- Optimal caliper widths for propensity‐score matching when estimating differences in means and differences in proportions in observational studiesPharmaceutical Statistics, 2011
- Statistical Criteria for Selecting the Optimal Number of Untreated Subjects Matched to Each Treated Subject When Using Many-to-One Matching on the Propensity ScoreAmerican Journal of Epidemiology, 2010
- Improving propensity score weighting using machine learningStatistics in Medicine, 2009
- Evaluating uses of data mining techniques in propensity score estimation: a simulation studyPharmacoepidemiology and Drug Safety, 2008
- A critical appraisal of propensity‐score matching in the medical literature between 1996 and 2003Statistics in Medicine, 2007
- Generating survival times to simulate Cox proportional hazards modelsStatistics in Medicine, 2005
- The Robust Inference for the Cox Proportional Hazards ModelJournal of the American Statistical Association, 1989