Using Lasso for Predictor Selection and to Assuage Overfitting: A Method Long Overlooked in Behavioral Sciences
Top Cited Papers
- 3 September 2015
- journal article
- research article
- Published by Taylor & Francis Ltd in Multivariate Behavioral Research
- Vol. 50 (5), 471-484
- https://doi.org/10.1080/00273171.2015.1036965
Abstract
Ordinary least squares and stepwise selection are widespread in behavioral science research; however, these methods are well known to encounter overfitting problems such that R2 and regression coefficients may be inflated while standard errors and p values may be deflated, ultimately reducing both the parsimony of the model and the generalizability of conclusions. More optimal methods for selecting predictors and estimating regression coefficients such as regularization methods (e.g., Lasso) have existed for decades, are widely implemented in other disciplines, and are available in mainstream software, yet, these methods are essentially invisible in the behavioral science literature while the use of sub optimal methods continues to proliferate. This paper discusses potential issues with standard statistical models, provides an introduction to regularization with specific details on both Lasso and its related predecessor ridge regression, provides an example analysis and code for running a Lasso analysis in R and SAS, and discusses limitations and related methods.Keywords
This publication has 51 references indexed in Scilit:
- GLMMLasso: An Algorithm for High-Dimensional Generalized Linear Mixed Models Using ℓ1-PenalizationJournal of Computational and Graphical Statistics, 2014
- Evaluation of the lasso and the elastic net in genome-wide association studiesFrontiers in Genetics, 2013
- Are individual differences of attachment predicting bereavement outcome after perinatal loss? A prospective cohort studyJournal of Psychosomatic Research, 2012
- Philosophy and the practice of Bayesian statisticsBritish Journal of Mathematical and Statistical Psychology, 2012
- Stability SelectionJournal of the Royal Statistical Society Series B: Statistical Methodology, 2010
- On the “degrees of freedom” of the lassoThe Annals of Statistics, 2007
- Why do we still use stepwise modelling in ecology and behaviour?Journal of Animal Ecology, 2006
- Information theory and hypothesis testing: a call for pluralismJournal of Applied Ecology, 2005
- The Use of anF-Statistic in Stepwise Regression ProceduresTechnometrics, 1972
- Über eine Frage der EigenwerttheorieThe European Physical Journal A, 1929