Testing for independence in contingency tables with complex sample survey data
- 11 March 2015
- journal article
- research article
- Published by Oxford University Press (OUP) in Biometrics
- Vol. 71 (3), 832-840
- https://doi.org/10.1111/biom.12297
Abstract
The test of independence of row and column variables in a contingency table is a widely used statistical test in many areas of application. For complex survey samples, use of the standard Pearson chi‐squared test is inappropriate due to correlation among units within the same cluster. Rao and Scott (1981, Journal of the American Statistical Association 76, 221–230) proposed an approach in which the standard Pearson chi‐squared statistic is multiplied by a design effect to adjust for the complex survey design. Unfortunately, this test fails to exist when one of the observed cell counts equals zero. Even with the large samples typical of many complex surveys, zero cell counts can occur for rare events, small domains, or contingency tables with a large number of cells. Here, we propose Wald and score test statistics for independence based on weighted least squares estimating equations. In contrast to the Rao–Scott test statistic, the proposed Wald and score test statistics always exist. In simulations, the score test is found to perform best with respect to type I error. The proposed method is motivated by, and applied to, post surgical complications data from the United States’ Nationwide Inpatient Sample (NIS) complex survey of hospitals in 2008.Keywords
Funding Information
- National Institutes of Health (CA 06922)
- National Institutes of Health (CA 160679)
This publication has 11 references indexed in Scilit:
- Comparative Analysis of Outcomes and Costs Following Open Radical Cystectomy Versus Robot-Assisted Laparoscopic Radical Cystectomy: Results From the US Nationwide Inpatient SampleEuropean Urology, 2012
- Rao's score, Neyman's C(α) and Silvey's LM tests: an essay on historical developments and some new resultsJournal of Statistical Planning and Inference, 2001
- On Generalized Score TestsThe American Statistician, 1992
- Hypothesis testing of regression parameters in semiparametric generalized linear models for cluster correlated dataBiometrika, 1990
- Generalized Linear ModelsPublished by Springer Science and Business Media LLC ,1989
- The Analysis of Cross-Classified Data Having Ordered and/or Unordered Categories: Association Models, Correlation Models, and Asymmetry Models for Contingency Tables With or Without Missing EntriesThe Annals of Statistics, 1985
- The Analysis of Categorical Data from Complex Sample Surveys: Chi-Squared Tests for Goodness of Fit and Independence in Two-Way TablesJournal of the American Statistical Association, 1981
- Wald's Test as Applied to Hypotheses in Logit AnalysisJournal of the American Statistical Association, 1977
- The Lagrangian Multiplier TestThe Annals of Mathematical Statistics, 1959
- Maximum-Likelihood Estimation of Parameters Subject to RestraintsThe Annals of Mathematical Statistics, 1958