Fewer permutations, more accurate P-values
Open Access
- 27 May 2009
- journal article
- research article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 25 (12), i161-i168
- https://doi.org/10.1093/bioinformatics/btp211
Abstract
Motivation: Permutation tests have become a standard tool to assess the statistical significance of an event under investigation. The statistical significance, as expressed in a P-value, is calculated as the fraction of permutation values that are at least as extreme as the original statistic, which was derived from non-permuted data. This empirical method directly couples both the minimal obtainable P-value and the resolution of the P-value to the number of permutations. Thereby, it imposes upon itself the need for a very large number of permutations when small P-values are to be accurately estimated. This is computationally expensive and often infeasible. Results: A method of computing P-values based on tail approximation is presented. The tail of the distribution of permutation values is approximated by a generalized Pareto distribution. A good fit and thus accurate P-value estimates can be obtained with a drastically reduced number of permutations when compared with the standard empirical way of computing P-values. Availability: The Matlab code can be obtained from the corresponding author on request. Contact:tknijnenburg@systemsbiology.org Supplementary information: Supplementary data are available at Bioinformatics online.Keywords
This publication has 24 references indexed in Scilit:
- Combinatorial effects of environmental parameters on transcriptional regulation in Saccharomyces cerevisiae: A quantitative analysis of a compendium of chemostat-based transcriptome dataBMC Genomics, 2009
- Exact Calculation of Distributions on Integers, with Application to Sequence AlignmentJournal of Computational Biology, 2009
- Modeling ChIP Sequencing In Silico with ApplicationsPLoS Computational Biology, 2008
- Computation of significance scores of unweighted Gene Set Enrichment AnalysesBMC Bioinformatics, 2007
- Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profilesProceedings of the National Academy of Sciences of the United States of America, 2005
- Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray ExperimentsStatistical Applications in Genetics and Molecular Biology, 2004
- PGC-1α-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetesNature Genetics, 2003
- A Gene-Expression Signature as a Predictor of Survival in Breast CancerNew England Journal of Medicine, 2002
- Significance analysis of microarrays applied to the ionizing radiation responseProceedings of the National Academy of Sciences of the United States of America, 2001
- KEGG: Kyoto Encyclopedia of Genes and GenomesNucleic Acids Research, 2000