Adaptive penalization in high-dimensional regression and classification with external covariates using variational Bayes

Open Access

9 October 2019

journal article
research article
Published by Oxford University Press (OUP) in Biostatistics

Vol. 22 (2), 348-364
https://doi.org/10.1093/biostatistics/kxz034

Abstract

Penalization schemes like Lasso or ridge regression are routinely used to regress a response of interest on a high-dimensional set of potential predictors. Despite being decisive, the question of the relative strength of penalization is often glossed over and only implicitly determined by the scale of individual predictors. At the same time, additional information on the predictors is available in many applications but left unused. Here, we propose to make use of such external covariates to adapt the penalization in a data-driven manner. We present a method that differentially penalizes feature groups defined by the covariates and adapts the relative strength of penalization to the information content of each group. Using techniques from the Bayesian tool-set our procedure combines shrinkage with feature selection and provides a scalable optimization scheme. We demonstrate in simulations that the method accurately recovers the true effect sizes and sparsity patterns per feature group. Furthermore, it leads to an improved prediction performance in situations where the groups have strong differences in dynamic range. In applications to data from high-throughput biology, the method enables re-weighting the importance of feature groups from different assays. Overall, using available covariates extends the range of applications of penalized regression, improves model interpretability and can improve prediction performance.

Keywords

Funding Information

European Union Horizon 2020
EMBL International

This publication has 40 references indexed in Scilit:

The Genotype-Tissue Expression (GTEx) project
Nature Genetics, 2013
Promise of personalized omics to precision medicine
Wires Systems Biology and Medicine, 2012
Independent filtering increases detection power for high-throughput experiments
Proceedings of the National Academy of Sciences of the United States of America, 2010
The Bayesian Lasso
Journal of the American Statistical Association, 2008
The Adaptive Lasso and Its Oracle Properties
Journal of the American Statistical Association, 2006
Model selection and estimation in regression with grouped variables
Journal of the Royal Statistical Society Series B: Statistical Methodology, 2005
Regularization and Variable Selection Via the Elastic Net
Journal of the Royal Statistical Society Series B: Statistical Methodology, 2005
Sparsity and Smoothness Via the Fused Lasso
Journal of the Royal Statistical Society Series B: Statistical Methodology, 2004
Bayesian Variable Selection in Linear Regression
Journal of the American Statistical Association, 1988
Ridge Regression: Biased Estimation for Nonorthogonal Problems
Technometrics, 1970

Cited by 9 articles