Adaptive penalization in high-dimensional regression and classification with external covariates using variational Bayes
Open Access
- 9 October 2019
- journal article
- research article
- Published by Oxford University Press (OUP) in Biostatistics
- Vol. 22 (2), 348-364
- https://doi.org/10.1093/biostatistics/kxz034
Abstract
Penalization schemes like Lasso or ridge regression are routinely used to regress a response of interest on a high-dimensional set of potential predictors. Despite being decisive, the question of the relative strength of penalization is often glossed over and only implicitly determined by the scale of individual predictors. At the same time, additional information on the predictors is available in many applications but left unused. Here, we propose to make use of such external covariates to adapt the penalization in a data-driven manner. We present a method that differentially penalizes feature groups defined by the covariates and adapts the relative strength of penalization to the information content of each group. Using techniques from the Bayesian tool-set our procedure combines shrinkage with feature selection and provides a scalable optimization scheme. We demonstrate in simulations that the method accurately recovers the true effect sizes and sparsity patterns per feature group. Furthermore, it leads to an improved prediction performance in situations where the groups have strong differences in dynamic range. In applications to data from high-throughput biology, the method enables re-weighting the importance of feature groups from different assays. Overall, using available covariates extends the range of applications of penalized regression, improves model interpretability and can improve prediction performance.Keywords
Funding Information
- European Union Horizon 2020
- EMBL International
This publication has 40 references indexed in Scilit:
- The Genotype-Tissue Expression (GTEx) projectNature Genetics, 2013
- Promise of personalized omics to precision medicineWires Systems Biology and Medicine, 2012
- Independent filtering increases detection power for high-throughput experimentsProceedings of the National Academy of Sciences of the United States of America, 2010
- The Bayesian LassoJournal of the American Statistical Association, 2008
- The Adaptive Lasso and Its Oracle PropertiesJournal of the American Statistical Association, 2006
- Model selection and estimation in regression with grouped variablesJournal of the Royal Statistical Society Series B: Statistical Methodology, 2005
- Regularization and Variable Selection Via the Elastic NetJournal of the Royal Statistical Society Series B: Statistical Methodology, 2005
- Sparsity and Smoothness Via the Fused LassoJournal of the Royal Statistical Society Series B: Statistical Methodology, 2004
- Bayesian Variable Selection in Linear RegressionJournal of the American Statistical Association, 1988
- Ridge Regression: Biased Estimation for Nonorthogonal ProblemsTechnometrics, 1970