Variable Selection With Prior Information for Generalized Linear Models via the Prior LASSO Method
- 2 January 2016
- journal article
- theory and-methods
- Published by Informa UK Limited in Journal of the American Statistical Association
- Vol. 111 (513), 355-376
- https://doi.org/10.1080/01621459.2015.1008363
Abstract
LASSO is a popular statistical tool often used in conjunction with generalized linear models that can simultaneously select variables and estimate parameters. When there are many variables of interest, as in current biological and biomedical studies, the power of LASSO can be limited. Fortunately, so much biological and biomedical data have been collected and they may contain useful information about the importance of certain variables. This paper proposes an extension of LASSO, namely, prior LASSO (pLASSO), to incorporate that prior information into penalized generalized linear models. The goal is achieved by adding in the LASSO criterion function an additional measure of the discrepancy between the prior information and the model. For linear regression, the whole solution path of the pLASSO estimator can be found with a procedure similar to the Least Angle Regression (LARS). Asymptotic theories and simulation results show that pLASSO provides significant improvement over LASSO when the prior information is relatively accurate. When the prior information is less reliable, pLASSO shows great robustness to the misspecification. We illustrate the application of pLASSO using a real data set from a genome-wide association study.Keywords
This publication has 40 references indexed in Scilit:
- Prioritizing GWAS Results: A Review of Statistical Methods and Recommendations for Their ApplicationAmerican Journal of Human Genetics, 2010
- Self-concordant analysis for logistic regressionElectronic Journal of Statistics, 2010
- Elastic-net regularization approaches for genome-wide association studies of rheumatoid arthritisBMC Proceedings, 2009
- Simultaneous analysis of Lasso and Dantzig selectorThe Annals of Statistics, 2009
- Dimension reduction and variable selection in case control studies via regularized likelihood optimizationElectronic Journal of Statistics, 2009
- Meta-analysis of two genome-wide association studies of bipolar disorder reveals important points of agreementMolecular Psychiatry, 2008
- Honest variable selection in linear and logistic regression models via ℓ1 and ℓ1+ℓ2 penalizationElectronic Journal of Statistics, 2008
- A genome-wide association study implicates diacylglycerol kinase eta (DGKH) and several other genes in the etiology of bipolar disorderMolecular Psychiatry, 2007
- Sparsity oracle inequalities for the LassoElectronic Journal of Statistics, 2007
- Genetics of affective (mood) disordersEuropean Journal of Human Genetics, 2006