Computing Distributions for Exact Logistic Regression

Abstract
Logistic regression is a commonly used technique for the analysis of retrospective and prospective epidemiological and clinical studies with binary response variables. Usually this analysis is performed using large sample approximations. When the sample size is small or the data structure sparse, the accuracy of the asymptotic approximations is in question. On other occasions, singularity of the covariance matrix of parameter estimates precludes asymptotic analysis. Under these circumstances, use of exact inferential procedures would seem to be a prudent alternative. Cox (1970) showed that exact inference on the parameters of a logistic model with binary response requires consideration of the distribution of sufficient statistics for these parameters. To date, however, resorting to the exact method has not been computationally feasible except in a few special situations. This article presents an efficient recursive algorithm that generates the joint and conditional distributions of the sufficient statistics and thus makes it feasible to perform exact inference for a much wider range of situations. Various methods of improving the efficiency of the basic algorithm, such as the application of appropriate criteria to delete infeasible vectors, recoding covariates, sorting observations by covariate values, and use of a two-step recursive procedure, are also described. The algorithm given in this article enables the data analyst to perform exact inference for models with or without interaction terms and for matched as well as unmatched designs. Exact analysis proposed by Cox (1970) was restricted to a single parameter. Since our algorithm can be used to generate any combination of joint and conditional distributions of the sufficient statistics, it paves the way for multiparametric exact inference. Further, this algorithm also provides a tool for comparing exact and asymptotic inferential procedures. Such comparisons would, it is hoped, provide statisticians with guidelines stating when each of the procedures should be preferred.