Filtering, FDR and power
Open Access
- 7 September 2010
- journal article
- Published by Springer Science and Business Media LLC in BMC Bioinformatics
- Vol. 11 (1), 450
- https://doi.org/10.1186/1471-2105-11-450
Abstract
Background: In high-dimensional data analysis such as differential gene expression analysis, people often use filtering methods like fold-change or variance filters in an attempt to reduce the multiple testing penalty and improve power. However, filtering may introduce a bias on the multiple testing correction. The precise amount of bias depends on many quantities, such as fraction of probes filtered out, filter statistic and test statistic used. Results: We show that a biased multiple testing correction results if non-differentially expressed probes are not filtered out with equal probability from the entire range of p-values. We illustrate our results using both a simulation study and an experimental dataset, where the FDR is shown to be biased mostly by filters that are associated with the hypothesis being tested, such as the fold change. Filters that induce little bias on the FDR yield less additional power of detecting differentially expressed genes. Finally, we propose a statistical test that can be used in practice to determine whether any chosen filter introduces bias on the FDR estimate used, given a general experimental setup. Conclusions: Filtering out of probes must be used with care as it may bias the multiple testing correction. Researchers can use our test for FDR bias to guide their choice of filter and amount of filtering in practice.Keywords
This publication has 16 references indexed in Scilit:
- A close examination of double filtering with fold change and t test in microarray analysisBMC Bioinformatics, 2009
- Testing significance relative to a fold-change threshold is a TREATBioinformatics, 2009
- A subtype of childhood acute lymphoblastic leukaemia with poor treatment outcome: a genome-wide classification studyThe Lancet Oncology, 2009
- Filtering for increased power for microarray data analysisBMC Bioinformatics, 2009
- Resampling‐Based Empirical Bayes Multiple Testing Procedures for Controlling Generalized Tail Probability and Expected Value Error Rates: Focus on the False Discovery Rate and Simulation StudyBiometrical Journal, 2008
- Dependency and false discovery rate: AsymptoticsThe Annals of Statistics, 2007
- Adaptive linear step-up procedures that control the false discovery rateBiometrika, 2006
- Statistical Development and Evaluation of Microarray Gene Expression Data FiltersJournal of Computational Biology, 2005
- A Direct Approach to False Discovery RatesJournal of the Royal Statistical Society Series B: Statistical Methodology, 2002
- The control of the false discovery rate in multiple testing under dependencyThe Annals of Statistics, 2001