Estimating the distribution of times from HIV seroconversion to aids using multiple imputation

Abstract
Multiple imputation is a model based technique for handling missing data problems. In this application we use the technique to estimate the distribution of times from HIV seroconversion to AIDS diagnosis with data from a cohort study of 4954 homosexual men with 4 years of follow‐up. In this example the missing data are the dates of diagnosis with AIDS. The imputation procedure is performed in two stages. In the first stage, we estimate the residual AIDS‐free time distribution as a function of covariates measured on the study participants with data provided by the participants who were seropositive at study entry, Specifically, we assume the residual AIDS‐free times follow a log‐normal regression model that depends on the covariates measured at enrolment on the seropositive participants. In the second stage we impute the date of AIDS diagnosis for the participants who seroconverted during the course of the study and are AIDS‐free with use of the log‐normal distribution estimated in the first stage and the covariates from each seroconverter's latest visit. The estimated proportions developing AIDS within 4 and within 7 years of seroconversion are 15 and 36 per cent respectively, with associated 95 per cent confidence intervals of (10, 21) and (26,47) per cent. We discuss the Bayesian foundations of the multiple imputation technique and the statistical and scientific assumptions.