Designing a genome-based HIV incidence assay with high sensitivity and specificity

Abstract
Objective: Considerable inaccuracy in estimates of HIV incidence has been a serious obstacle to the development of efficient HIV/AIDS prevention and interventions. Accurately distinguishing recent or incident infections from chronic infections enables one to monitor epidemics and evaluate the impact of HIV prevention/intervention trials. However, serological testing has not been able to realize these promises due to a number of critical limitations. Our study is to design a novel scheme of identifying incident infections in a highly accurate manner, based on the characteristics of HIV gene diversification within an infected individual. Methods: We perform a comprehensive meta-analysis on 5596 full envelope HIV genes generated by single genome amplification-direct sequencing from 182 incident and 43 chronic cases. We devise a binary classification test based on the tail characteristics of the Hamming distance distribution of sequences. Results: We identify a clear signature of incident infections, the presence of closely related strains in the sampled HIV envelope gene sequences in each HIV-infected patient, in both single-variant and multivariant transmissions. The sequence similarity used as a biomarker is found to have high specificity and sensitivity, greater than 95%, and is robust to viral and host-specific factors such as the clade of the viral strain, viral load, and the length and location of sequences in the HIV envelope gene. Conclusion: Because of rapid and continuing improvements in sequencing technology and cost, sequence-based incidence assays hold great promise as a means of quantifying HIV incidence from a single blood test.