Global testing under sparse alternatives: ANOVA, multiple comparisons and the higher criticism
Open Access
- 1 October 2011
- journal article
- Published by Institute of Mathematical Statistics in The Annals of Statistics
- Vol. 39 (5)
- https://doi.org/10.1214/11-aos910
Abstract
Testing for the significance of a subset of regression coefficients in a linear model, a staple of statistical analysis, goes back at least to the work of Fisher who introduced the analysis of variance (ANOVA). We study this problem under the assumption that the coefficient vector is sparse, a common situation in modern high-dimensional settings. Suppose the regression vector is of dimension p with S non-zero coefficients with S = p^{1 -alpha}. Under moderate sparsity levels, i.e. alpha <= 1/2, we show that ANOVA is essentially optimal under some conditions on the design. This is no longer the case under strong sparsity constraints, i.e.~alpha > 1/2. In such settings, a multiple comparison procedure is often preferred and we establish its optimality when alpha >= 3/4. However, these two very popular methods are suboptimal, and sometimes powerless, under moderately strong sparsity where 1/2 < alpha < 3/4. We suggest a method based on the Higher Criticism that is powerful in the whole range alpha > 1/2. This optimality property is true for a variety of designs, including the classical (balanced) multi-way designs and more modern `p > n' designs arising in genetics and signal processing. In addition to the standard fixed effects model, we establish similar results for a random effects model where the nonzero coefficients of the regression vector are normally distributed.Keywords
Other Versions
This publication has 30 references indexed in Scilit:
- Detection boundary in sparse regressionElectronic Journal of Statistics, 2010
- Genome-wide association studies for complex traits: consensus, uncertainty and challengesNature Reviews Genetics, 2008
- Heteroscedastic One-Way ANOVA and Lack-of-Fit TestsJournal of the American Statistical Association, 2004
- Multiple Hypothesis Testing in Microarray ExperimentsStatistical Science, 2003
- Harmonic decomposition of audio signals with matching pursuitIEEE Transactions on Signal Processing, 2003
- From patterns to pathways: gene expression data analysis comes of ageNature Genetics, 2002
- Uncertainty principles and ideal atomic decompositionIEEE Transactions on Information Theory, 2001
- Analysis of Variance for Gene Expression Microarray DataJournal of Computational Biology, 2000
- Matching pursuits with time-frequency dictionariesIEEE Transactions on Signal Processing, 1993
- Optimum Tests for Fixed Effects and Variance Components in Balanced ModelsJournal of the American Statistical Association, 1988