Bias in the estimation of false discovery rate in microarray studies

Abstract
Motivation: The false discovery rate (FDR) provides a key statistical assessment for microarray studies. Its value depends on the proportion π0 of non-differentially expressed (non-DE) genes. In most microarray studies, many genes have small effects not easily separable from non-DE genes. As a result, current methods often overestimate π0 and FDR, leading to unnecessary loss of power in the overall analysis. Methods: For the common two-sample comparison we derive a natural mixture model of the test statistic and an explicit bias formula in the standard estimation of π0. We suggest an improved estimation of π0 based on the mixture model and describe a practical likelihood-based procedure for this purpose. Results: The analysis shows that a large bias occurs when π0 is far from 1 and when the non-centrality parameters of the distribution of the test statistic are near zero. The theoretical result also explains substantial discrepancies between non-parametric and model-based estimates of π0. Simulation studies indicate mixture-model estimates are less biased than standard estimates. The method is applied to breast cancer and lymphoma data examples. Availability: An R-package OCplus containing functions to compute π0 based on the mixture model, the resulting FDR and other operating characteristics of microarray data, is freely available at http://www.meb.ki.se/~yudpaw Contact:yudi.pawitan@meb.ki.se and alexander.ploner@meb.ki.se