Abstract
As with other statistical methods, missing data often create major problems for the estimation of structural equation models (SEMs). Conventional methods such as listwise or pairwise deletion generally do a poor job of using all the available information. However, structural equation modelers are fortunate that many programs for estimating SEMs now have maximum likelihood methods for handling missing data in an optimal fashion. In addition to maximum likelihood, this article also discusses multiple imputation. This method has statistical properties that are almost as good as those for maximum likelihood and can be applied to a much wider array of models and estimation methods. Virtually all methods of statistical analysis are plagued by problems with missing data, and structural equation modeling is no exception. It is well known that the use of inappropriate methods for handling missing data can lead to bias in parameter estimates (Jones, 1996), bias in standard errors and test statistics (Glasser, 1964), and inefficient use of the data (Afifi & Elashoff, 1966). This article surveys various methods that are available for handling missing data in the estimation of structural equation models (SEMs). After reviewing such conventional methods as listwise deletion, pairwise deletion, and regression imputation, I focus on the implementation of two newer methods, maximum likelihood and multiple imputation. These methods have much better statis- tical properties than conventional methods have under consider- ably weaker assumptions, a rare combination for new statistical methods. Before discussing the methods, it is essential to clarify the meaning of certain assumptions that are often invoked in justifying one method or another (Rubin, 1976). To keep things simple, suppose that a data set contains only two variables, X and Y. We observe X for all cases, but data are missing on Y for, say, 20% of the cases. We say that data on Y are missing completely at random (MCAR) if the probability that data are missing on Y depends on neither Y nor X. Formally, we have Pr(Y is missing X, Y) Pr(Y

This publication has 22 references indexed in Scilit: