Fewer permutations, more accurate P-values

Open Access

27 May 2009

journal article
research article
Published by Oxford University Press (OUP) in Bioinformatics

Vol. 25 (12), i161-i168
https://doi.org/10.1093/bioinformatics/btp211

Abstract

Motivation: Permutation tests have become a standard tool to assess the statistical significance of an event under investigation. The statistical significance, as expressed in a P-value, is calculated as the fraction of permutation values that are at least as extreme as the original statistic, which was derived from non-permuted data. This empirical method directly couples both the minimal obtainable P-value and the resolution of the P-value to the number of permutations. Thereby, it imposes upon itself the need for a very large number of permutations when small P-values are to be accurately estimated. This is computationally expensive and often infeasible. Results: A method of computing P-values based on tail approximation is presented. The tail of the distribution of permutation values is approximated by a generalized Pareto distribution. A good fit and thus accurate P-value estimates can be obtained with a drastically reduced number of permutations when compared with the standard empirical way of computing P-values. Availability: The Matlab code can be obtained from the corresponding author on request. Contact:tknijnenburg@systemsbiology.org Supplementary information: Supplementary data are available at Bioinformatics online.

Keywords

This publication has 24 references indexed in Scilit:

Combinatorial effects of environmental parameters on transcriptional regulation in Saccharomyces cerevisiae: A quantitative analysis of a compendium of chemostat-based transcriptome data
BMC Genomics, 2009
Exact Calculation of Distributions on Integers, with Application to Sequence Alignment
Journal of Computational Biology, 2009
Modeling ChIP Sequencing In Silico with Applications
PLoS Computational Biology, 2008
Computation of significance scores of unweighted Gene Set Enrichment Analyses
BMC Bioinformatics, 2007
Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles
Proceedings of the National Academy of Sciences of the United States of America, 2005
Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments
Statistical Applications in Genetics and Molecular Biology, 2004
PGC-1α-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes
Nature Genetics, 2003
A Gene-Expression Signature as a Predictor of Survival in Breast Cancer
New England Journal of Medicine, 2002
Significance analysis of microarrays applied to the ionizing radiation response
Proceedings of the National Academy of Sciences of the United States of America, 2001
KEGG: Kyoto Encyclopedia of Genes and Genomes
Nucleic Acids Research, 2000

Cited by 174 articles