Validation of a Hierarchical Deterministic Record-Linkage Algorithm Using Data From 2 Different Cohorts of Human Immunodeficiency Virus-Infected Persons and Mortality Databases in Brazil

Abstract
Loss to follow-up is a major source of bias in cohorts of patients with human immunodeficiency virus (HIV) and could lead to underestimation of mortality. The authors developed a hierarchical deterministic linkage algorithm to be used primarily with cohorts of HIV-infected persons to recover vital status information for patients lost to follow-up. Data from patients known to be deceased in 2 cohorts in Rio de Janeiro, Brazil, and data from the Rio de Janeiro State mortality database for 1999–2006 were used to validate the algorithm. A fully automated procedure yielded a sensitivity of 92.9% and specificity of 100% when no information was missing. When the automated procedure was combined with clerical review, in a scenario of 5% death prevalence and 20% missing mothers’ names, sensitivity reached 96.5% and specificity 100%. In a practical application, the algorithm significantly increased death rates and decreased the rate of loss to follow-up in the cohorts. The finding that 23.9% of matched records did not give HIV or acquired immunodeficiency syndrome as the cause of death reinforces the need to search all-cause mortality databases and alerts for possible underestimation of death rates. These results indicate that the algorithm is accurate enough to recover vital status information on patients lost to follow-up in cohort studies.