Robust policy evaluation from large-scale observational studies

Abstract
Under the current policy decision making paradigm we make or evaluate a policy decision by intervening different socio-economic parameters and analyzing the impact of those interventions. This process involves identifying the causal relation between interventions and outcomes. Matching method is one of the popular techniques to identify such causal relations. However, in one-to-one matching, when a treatment or control unit has multiple pair assignment options with similar match quality, different matching algorithms often assign different pairs. Since all the matching algorithms assign pairs without considering the outcomes, it is possible that with the same data and same hypothesis, different experimenters can reach different conclusions creating an uncertainty in policy decision making. This problem becomes more prominent in the case of large-scale observational studies as there are more pair assignment options. Recently, a robust approach has been proposed to tackle the uncertainty that uses an integer programming model to explore all possible assignments. Though the proposed integer programming model is very efficient in making robust causal inference, it is not scalable to big data observational studies. With the current approach, an observational study with 50,000 samples will generate hundreds of thousands binary variables. Solving such integer programming problem is computationally expensive and becomes even worse with the increase of sample size. In this work, we consider causal inference testing with binary outcomes and propose computationally efficient algorithms that are adaptable for large-scale observational studies. By leveraging the structure of the optimization model, we propose a robustness condition that further reduces the computational burden. We validate the efficiency of the proposed algorithms by testing the causal relation between the Medicare Hospital Readmission Reduction Program (HRRP) and non-index readmissions (i.e., readmission to a hospital that is different from the hospital that discharged the patient) from the State of California Patient Discharge Database from 2010 to 2014. Our result shows that HRRP has a causal relation with the increase in non-index readmissions. The proposed algorithms proved to be highly scalable in testing causal relations from large-scale observational studies.