The Role of Sample Cluster Means in Multilevel Models

Abstract
The paper explores some issues related to endogeneity in multilevel models, focusing on the case where the random effects are correlated with a level 1 covariate in a linear random intercept model. We consider two basic specifications, without and with the sample cluster mean. It is generally acknowledged that the omission of the cluster mean may cause omitted-variable bias. However, it is often neglected that the inclusion of the sample cluster mean in place of the population cluster mean entails a measurement error that yields biased estimators for both the slopes and the variance components. In particular, the contextual effect is attenuated, while the level 2 variance is inflated. We derive explicit formulae for measurement error biases that allow us to implement simple post-estimation corrections based on the reliability of the covariate. In the first part of the paper, the issue is tackled in a standard framework where the population cluster mean is treated as a latent variable. Later we consider a different framework arising when sampling from clusters of finite size, where the latent variable methods may have a poor performance, and we show how to effectively modify the measurement error correction. The theoretical analysis is supplemented with a simulation study and a discussion of the implications for effectiveness evaluation.