Nonproportional Sampling and the Amplification of Correlations

Abstract
A theoretical analysis shows that sample correlations between two binary variables will be inflated when the frequency distributions of the two variables are flatter (i.e., closer to equal frequencies for the two values) in the sample than in the population. A correlation-assessment study in which participants were free to choose their own sample revealed an overwhelming preference for samples that included roughly the same number of observations for the two values of dichotomous variables, irrespective of their actual distribution in the population. Subjective estimates of observed correlations followed the sample correlations—which were inflated, as predicted—more closely than the true correlations. People's sampling behavior thus resembles that of a research designer who maximizes the chance of detecting a relationship, at the cost of diminished accuracy in estimating its strength.