Two-stage designs for experiments with a large number of hypotheses

Abstract
Motivation: When a large number of hypotheses are investigated the false discovery rate (FDR) is commonly applied in gene expression analysis or gene association studies. Conventional single-stage designs may lack power due to low sample sizes for the individual hypotheses. We propose two-stage designs where the first stage is used to screen the ‘promising’ hypotheses which are further investigated at the second stage with an increased sample size. A multiple test procedure based on sequential individual P-values is proposed to control the FDR for the case of independent normal distributions with known variance. Results: The power of optimal two-stage designs is impressively larger than the power of the corresponding single–stage design with equal costs. Extensions to the case of unknown variances and correlated test statistics are investigated by simulations. Moreover, it is shown that the simple multiple test procedure using first stage data for screening purposes and deriving the test decisions only from second stage data is a very powerful option. Availability: An R-program is available at http://www.meduniwien.ac.at/medstat/research/fdr/application.R Contact:Martin.Posch@meduniwien.ac.at Supplementary information: Supplementary data for this paper is available at Bioinformatics online.