Sparse Principal Component Analysis
Top Cited Papers
- 1 June 2006
- journal article
- research article
- Published by Taylor & Francis Ltd in Journal of Computational and Graphical Statistics
- Vol. 15 (2), 265-286
- https://doi.org/10.1198/106186006x113430
Abstract
Principal component analysis (PCA) is widely used in data processing and dimensionality reduction. However, PCA suffers from the fact that each principal component is a linear combination of all the original variables, thus it is often difficult to interpret the results. We introduce a new method called sparse principal component analysis (SPCA) using the lasso (elastic net) to produce modified principal components with sparse loadings. We first show that PCA can be formulated as a regression-type optimization problem; sparse loadings are then obtained by imposing the lasso (elastic net) constraint on the regression coefficients. Efficient algorithms are proposed to fit our SPCA models for both regular multivariate data and gene expression arrays. We also give a new formula to compute the total variance of modified principal components. As illustrations, SPCA is applied to real and simulated data with encouraging results.Keywords
This publication has 14 references indexed in Scilit:
- Regularization and Variable Selection Via the Elastic NetJournal of the Royal Statistical Society Series B: Statistical Methodology, 2005
- Least angle regressionThe Annals of Statistics, 2004
- Diagnosis of multiple cancer types by shrunken centroids of gene expressionProceedings of the National Academy of Sciences of the United States of America, 2002
- Multiclass cancer diagnosis using tumor gene expression signaturesProceedings of the National Academy of Sciences of the United States of America, 2001
- A new approach to variable selection in least squares problemsIMA Journal of Numerical Analysis, 2000
- Loading and correlations in the interpretation of principle compenentsJournal of Applied Statistics, 1995
- Rotation of principal components: choice of normalization constraintsJournal of Applied Statistics, 1995
- Principal VariablesTechnometrics, 1984
- Two Case Studies in the Application of Principal Component AnalysisJournal of the Royal Statistical Society Series C: Applied Statistics, 1967