Models for Contingency Tables With Known Margins When Target and Sampled Populations Differ

Abstract
The analysis of two-way contingency tables with known margins is considered. Four methods for estimating the cell probabilities are compared, namely, raking (RAKE), maximum likelihood under random sampling (MLRS), minimum chi-squared (MCSQ), and least squares (LSQ). Assuming random sampling from the target population, these methods are known to be asymptotically equivalent, and small-sample studies have suggested that MCSQ is slightly better than the other methods in average root mean squared error. We consider properties of the methods when the sampled population differs from the target population, through deficiencies in the sampling frame or defects in the implementation of the sample. We show that each method is in fact maximum likelihood for a particular model relating the target and sampled populations. Expressions for the standard errors of the estimates are developed under these alternative models. The methods are compared on data from a health survey and in a simulation study where each of the methods is assessed using data generated in a variety of ways. The results suggest that LSQ is inferior to the other three methods, and RAKE and MLRS dominate MCSQ.