Use of Census-based Aggregate Variables to Proxy for Socioeconomic Group: Evidence from National Samples

Abstract
Increasingly, investigators append census-based socioeconomic characteristics of residential areas to individual records to address the problem of inadequate socioeconomic information on health data sets. Little empirical attention has been given to the validity of this approach. The authors estimate health outcome equations using samples from nationally representative data sets linked to census data. They investigate whether statistical power is sensitive to the timing of census data collection or to the level of aggregation of the census data; whether different census items are conceptually distinct; and whether the use of multiple aggregate measures in health outcome equations improves prediction compared with a single aggregate measure. The authors find little difference in estimates when using 1970 compared with 1980 US Bureau of the Census data or zip code compared with tract level variables. However, aggregate variables are highly multicollinear. Associations of health outcomes with aggregate measures are substantially weaker than with microlevel measures. The authors conclude that aggregate measures can not be interpreted as if they were microlevel variables nor should a specific aggregate measure be interpreted to represent the effects of what it is labeled. Am J Epidemiol 1998; 148: 475–86.