Statistical challenges of high-dimensional data

Top Cited Papers

13 November 2009

journal article
other
Published by The Royal Society in Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences

Vol. 367 (1906), 4237-4253
https://doi.org/10.1098/rsta.2009.0159

Abstract

Modern applications of statistical theory and methods can involve extremely large datasets, often with huge numbers of measurements on each of a comparatively small number of experimental units. New methodology and accompanying theory have emerged in response: the goal of this Theme Issue is to illustrate a number of these recent developments. This overview article introduces the difficulties that arise with high-dimensional data in the context of the very familiar linear statistical model: we give a taste of what can nevertheless be achieved when the parameter vector of interest is sparse, that is, contains many zero elements. We describe other ways of identifying low-dimensional subspaces of the data space that contain all useful information. The topic of classification is then reviewed along with the problem of identifying, from within a very large set, the variables that help to classify observations. Brief mention is made of the visualization of high-dimensional data and ways to handle computational problems in Bayesian analysis are described. At appropriate points, reference is made to the other papers in the issue.

Keywords

This publication has 42 references indexed in Scilit:

On landmark selection and sampling in high-dimensional data analysis
Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 2009
Selective inference in complex research
Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 2009
Impossibility of successful classification when useful features are rare and weak
Proceedings of the National Academy of Sciences of the United States of America, 2009
On Consistency and Sparsity for Principal Components Analysis in High Dimensions
Journal of the American Statistical Association, 2009
Multivariate analysis and Jacobi ensembles: Largest eigenvalue, Tracy–Widom limits and rates of convergence
The Annals of Statistics, 2008
Sure Independence Screening for Ultrahigh Dimensional Feature Space
Journal of the Royal Statistical Society Series B: Statistical Methodology, 2008
Higher criticism thresholding: Optimal feature selection when useful features are rare and weak
Proceedings of the National Academy of Sciences of the United States of America, 2008
Fisher Lecture: Dimension Reduction in Regression
Statistical Science, 2007
Variational Bayesian learning of directed graphical models with hidden variables
Bayesian Analysis, 2006
Ridge Regression: Biased Estimation for Nonorthogonal Problems
Technometrics, 1970

Cited by 260 articles