Restriction of Range Corrections When Both Distribution and Selection Assumptions Are Violated

Abstract
In validating a selection test (x) as a predictor of y, an incomplete xy data set must often be dealt with. A well-known correction formula is available for esti mating the xy correlation in some total group using the xy data of the selected cases and x data of the unse lected cases. The formula yields the ryx correlation (1) when the regression of y on x is linear and homosce dastic and (2) when selection can be assumed to be based on x alone. Although previous research has con sidered the accuracy of the correction formula when either Condition 1 or 2 is violated, no studies have considered the most realistic case where both Condi tions 1 and 2 are simultaneously violated. In the pres ent study six real data sets and five simulated selection models were used to investigate the accuracy of the correction formula when neither assumption is satis fied. Each of the data sets violated the linearity and/or homogeneity assumptions. Further, the selection models represent cases where selection is not a func tion of x alone. The results support two basic conclu sions. First, the correction formula is not robust to vi olations in Conditions 1 and 2. Reasonably small errors occur only for very modest degrees of selection. Secondly, although biased, the correction formula can be less biased than the uncorrected correlation for cer tain distribution forms. However, for other distribution forms, the corrected correlation can be less accurate than the uncorrected correlation. A description of this latter type of distribution form is given.