Using Lasso for Predictor Selection and to Assuage Overfitting: A Method Long Overlooked in Behavioral Sciences

Top Cited Papers

3 September 2015

journal article
research article
Published by Taylor & Francis Ltd in Multivariate Behavioral Research

Vol. 50 (5), 471-484
https://doi.org/10.1080/00273171.2015.1036965

Abstract

Ordinary least squares and stepwise selection are widespread in behavioral science research; however, these methods are well known to encounter overfitting problems such that R² and regression coefficients may be inflated while standard errors and p values may be deflated, ultimately reducing both the parsimony of the model and the generalizability of conclusions. More optimal methods for selecting predictors and estimating regression coefficients such as regularization methods (e.g., Lasso) have existed for decades, are widely implemented in other disciplines, and are available in mainstream software, yet, these methods are essentially invisible in the behavioral science literature while the use of sub optimal methods continues to proliferate. This paper discusses potential issues with standard statistical models, provides an introduction to regularization with specific details on both Lasso and its related predecessor ridge regression, provides an example analysis and code for running a Lasso analysis in R and SAS, and discusses limitations and related methods.

Keywords

This publication has 51 references indexed in Scilit:

GLMMLasso: An Algorithm for High-Dimensional Generalized Linear Mixed Models Using ℓ₁-Penalization
Journal of Computational and Graphical Statistics, 2014
Evaluation of the lasso and the elastic net in genome-wide association studies
Frontiers in Genetics, 2013
Are individual differences of attachment predicting bereavement outcome after perinatal loss? A prospective cohort study
Journal of Psychosomatic Research, 2012
Philosophy and the practice of Bayesian statistics
British Journal of Mathematical and Statistical Psychology, 2012
Stability Selection
Journal of the Royal Statistical Society Series B: Statistical Methodology, 2010
On the “degrees of freedom” of the lasso
The Annals of Statistics, 2007
Why do we still use stepwise modelling in ecology and behaviour?
Journal of Animal Ecology, 2006
Information theory and hypothesis testing: a call for pluralism
Journal of Applied Ecology, 2005
The Use of anF-Statistic in Stepwise Regression Procedures
Technometrics, 1972
Über eine Frage der Eigenwerttheorie
The European Physical Journal A, 1929

Cited by 232 articles