Model selection procedures in social research: Monte-Carlo simulation results
- 15 August 2008
- journal article
- research article
- Published by Taylor & Francis Ltd in Journal of Applied Statistics
- Vol. 35 (10), 1093-1114
- https://doi.org/10.1080/03081070802203959
Abstract
Model selection strategies play an important, if not explicit, role in quantitative research. The inferential properties of these strategies are largely unknown, therefore, there is little basis for recommending (or avoiding) any particular set of strategies. In this paper, we evaluate several commonly used model selection procedures [Bayesian information criterion (BIC), adjusted R 2, Mallows’ C p, Akaike information criteria (AIC), AICc, and stepwise regression] using Monte-Carlo simulation of model selection when the true data generating processes (DGP) are known. We find that the ability of these selection procedures to include important variables and exclude irrelevant variables increases with the size of the sample and decreases with the amount of noise in the model. None of the model selection procedures do well in small samples, even when the true DGP is largely deterministic; thus, data mining in small samples should be avoided entirely. Instead, the implicit uncertainty in model specification should be explicitly discussed. In large samples, BIC is better than the other procedures at correctly identifying most of the generating processes we simulated, and stepwise does almost as well. In the absence of strong theory, both BIC and stepwise appear to be reasonable model selection strategies in large samples. Under the conditions simulated, adjusted R 2, Mallows’ C p AIC, and AICc are clearly inferior and should be avoided.Keywords
This publication has 14 references indexed in Scilit:
- Multimodel InferenceSociological Methods & Research, 2004
- AIC and BICSociological Methods & Research, 2004
- Regression and Time Series Model SelectionPublished by World Scientific Pub Co Pte Ltd ,1998
- A comparison of model selection criteriaEconometric Reviews, 1992
- Regression and time series model selection in small samplesBiometrika, 1989
- A Test for Normality of Observations and Regression ResidualsInternational Statistical Review / Revue Internationale de Statistique, 1987
- Data MiningThe Review of Economics and Statistics, 1983
- Some Comments onCpTechnometrics, 1973
- Participation in Illegitimate Activities: A Theoretical and Empirical InvestigationJournal of Political Economy, 1973
- Sample and population score matrices and sample correlation matrices from an arbitrary population correlation matrixPsychometrika, 1962