How conservative is Fisher's exact test? A quantitative evaluation of the two‐sample comparative binomial trial

Abstract
The debate as to which statistical methodology is most appropriate for the analysis of the two‐sample comparative binomial trial has persisted for decades. Practitioners who favor the conditional methods of Fisher, Fisher's exact test (FET), claim that only experimental outcomes containing the same amount of information should be considered when performing analyses. Hence, the total number of successes should be fixed at its observed level in hypothetical repetitions of the experiment. Using conditional methods in clinical settings can pose interpretation difficulties, since results are derived using conditional sample spaces rather than the set of all possible outcomes. Perhaps more importantly from a clinical trial design perspective, this test can be too conservative, resulting in greater resource requirements and more subjects exposed to an experimental treatment. The actual significance level attained by FET (the size of the test) has not been reported in the statistical literature. Berger (J. R. Statist. Soc. D (The Statistician) 2001; 50:79–85) proposed assessing the conservativeness of conditional methods using p‐value confidence intervals. In this paper we develop a numerical algorithm that calculates the size of FET for sample sizes, n, up to 125 per group at the two‐sided significance level, α=0.05. Additionally, this numerical method is used to define new significance levels α*=α+ε, where ε is a small positive number, for each n, such that the size of the test is as close as possible to the pre‐specified α (0.05 for the current work) without exceeding it. Lastly, a sample size and power calculation example are presented, which demonstrates the statistical advantages of implementing the adjustment to FET (using α* instead of α) in the two‐sample comparative binomial trial. Copyright © 2008 John Wiley & Sons, Ltd.

This publication has 13 references indexed in Scilit: