Selection of the Best Subset in Regression Analysis

Abstract
The problem of selecting the best subset or subsets of independent variables in a multiple linear regression analysis is two-fold. The first, and most important problem is the development of criterion for choosing between two contending subsets. Applying these criteria to all possible subsets, if the number of independent variables is large, may not be economically feasible and so the second problem is concerned with decreasing the computational effort. This paper is concerned with the second question using the C p -statistic of Mallows as the basic criterion for comparing two regressions. A procedure is developed which will indicate ‘good’ regressions with B minimum of computation.