Statistical challenges of high-dimensional data
Top Cited Papers
- 13 November 2009
- journal article
- other
- Published by The Royal Society in Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences
- Vol. 367 (1906), 4237-4253
- https://doi.org/10.1098/rsta.2009.0159
Abstract
Modern applications of statistical theory and methods can involve extremely large datasets, often with huge numbers of measurements on each of a comparatively small number of experimental units. New methodology and accompanying theory have emerged in response: the goal of this Theme Issue is to illustrate a number of these recent developments. This overview article introduces the difficulties that arise with high-dimensional data in the context of the very familiar linear statistical model: we give a taste of what can nevertheless be achieved when the parameter vector of interest is sparse, that is, contains many zero elements. We describe other ways of identifying low-dimensional subspaces of the data space that contain all useful information. The topic of classification is then reviewed along with the problem of identifying, from within a very large set, the variables that help to classify observations. Brief mention is made of the visualization of high-dimensional data and ways to handle computational problems in Bayesian analysis are described. At appropriate points, reference is made to the other papers in the issue.Keywords
This publication has 42 references indexed in Scilit:
- On landmark selection and sampling in high-dimensional data analysisPhilosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 2009
- Selective inference in complex researchPhilosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 2009
- Impossibility of successful classification when useful features are rare and weakProceedings of the National Academy of Sciences of the United States of America, 2009
- On Consistency and Sparsity for Principal Components Analysis in High DimensionsJournal of the American Statistical Association, 2009
- Multivariate analysis and Jacobi ensembles: Largest eigenvalue, Tracy–Widom limits and rates of convergenceThe Annals of Statistics, 2008
- Sure Independence Screening for Ultrahigh Dimensional Feature SpaceJournal of the Royal Statistical Society Series B: Statistical Methodology, 2008
- Higher criticism thresholding: Optimal feature selection when useful features are rare and weakProceedings of the National Academy of Sciences of the United States of America, 2008
- Fisher Lecture: Dimension Reduction in RegressionStatistical Science, 2007
- Variational Bayesian learning of directed graphical models with hidden variablesBayesian Analysis, 2006
- Ridge Regression: Biased Estimation for Nonorthogonal ProblemsTechnometrics, 1970