Methods of variable selection in regression modeling

1 January 1998

journal article
research article
Published by Taylor & Francis Ltd in Communications in Statistics - Simulation and Computation

Vol. 27 (3), 711-734
https://doi.org/10.1080/03610919808813505

Abstract

Simulation was used to evaluate the performances of several methods of variable selection in regression modeling: stepwise regression based on partial F-tests, stepwise minimization of Mallows’ C_p statistic and Schwarz’s Bayes Information Criterion (BIC), and regression trees constructed with two kinds of pruning. Five to 25 covariates were generated in multivariate clusters, and responses were obtained from an ordinary linear regression model involving three of the covariates; each data set had 50 observations. The regression-tree approaches were markedly inferior to the other methods in discriminating between informative and noninformative covariates, and their predictions of responses in “new” data sets were much more variable and less accurate than those of the other methods. The F-test, C_p and BIC approaches were similar in their overall frequencies of “correct” decisions about inclusion or exclusion of covariates, with the C_p method leading to the largest models and the BIC method to the smallest, The three methods were also comparable in their ability to predict “new” observations, with perhaps a tendency for the C_p approach to perform relatively more poorly for large covariate pools. The abilities of all methods to discriminate between informative and noninformative covariates and to predict “new” observations decreased with increasing size of the covariate pool.

Keywords

This publication has 13 references indexed in Scilit:

MULTIVARIABLE PROGNOSTIC MODELS: ISSUES IN DEVELOPING MODELS, EVALUATING ASSUMPTIONS AND ADEQUACY, AND MEASURING AND REDUCING ERRORS
Statistics in Medicine, 1996
Bayes Factors
Journal of the American Statistical Association, 1995
AIC Model Selection in Overdispersed Capture‐Recapture Data
Ecology, 1994
Submodel Selection and Evaluation in Regression. The X-Random Case
International Statistical Review, 1992
Backward, forward and stepwise automated subset selection algorithms: Frequency of obtaining authentic and noise variables
British Journal of Mathematical and Statistical Psychology, 1992
Predicting crustacean zooplankton species richness
Limnology and Oceanography, 1992
Model Specification: The Views of Fisher and Neyman, and Later Developments
Statistical Science, 1990
Role of Models in Statistical Analysis
Statistical Science, 1990
Estimating the Dimension of a Model
The Annals of Statistics, 1978
Regressions by Leaps and Bounds
Technometrics, 1974

Cited by 42 articles