Sampling Weights and Regression Analysis

Abstract
Most major population surveys used by social scientists are based on complex sampling designs where sampling units have different probabilities of being selected. Although sampling weights must generally be used to derive unbiased estimates of univariate population characteristics, the decision about their use in regression analysis is more complicated. Where sampling weights are solely a function of independent variables included in the model, unweighted OLS estimates are preferred because they are unbiased, consistent, and have smaller standard errors than weighted OLS estimates. Where sampling weights are a function of the dependent variable (and thus of the error term), we recommend first attempting to respecify the model so that they are solely a function of the independent variables. If this can be accomplished, then unweighted OLS is again preferred. If the model cannot be respecified, then estimation of the model using sampling weights may be appropriate. In this case, however, the formula used by most computer programs for calculating standard errors will be incorrect. We recommend using the White heteroskedastic consistent estimator for the standard errors.