Highly Sensitive and Specific Detection of Rare Variants in Mixed Viral Populations from Massively Parallel Sequence Data
Open Access
- 15 March 2012
- journal article
- research article
- Published by Public Library of Science (PLoS) in PLoS Computational Biology
- Vol. 8 (3), e1002417
- https://doi.org/10.1371/journal.pcbi.1002417
Abstract
Viruses diversify over time within hosts, often undercutting the effectiveness of host defenses and therapeutic interventions. To design successful vaccines and therapeutics, it is critical to better understand viral diversification, including comprehensively characterizing the genetic variants in viral intra-host populations and modeling changes from transmission through the course of infection. Massively parallel sequencing technologies can overcome the cost constraints of older sequencing methods and obtain the high sequence coverage needed to detect rare genetic variants (97% sensitivity and >97% specificity on control read sets. On data derived from a patient after four years of HIV-1 infection, V-Phaser detected 2,015 variants across the ∼10 kb genome, including 603 rare variants (V-Phaser identified variants at frequencies down to 0.2%, comparable to the detection threshold of allele-specific PCR, a method that requires prior knowledge of the variants. The high sensitivity and specificity of V-Phaser enables identifying and tracking changes in low frequency variants in mixed populations such as RNA viruses. New sequencing technologies provide unprecedented resolution to study pathogen populations, such as the single stranded RNA viruses HIV, dengue (DENV), and West Nile (WNV), and how they evolve within infected individuals in response to immune, therapeutic, and vaccine pressures. While these new technologies provide high volumes of data, these data contain process errors. To detect biological variants, especially those occurring at low frequencies in the population, these technologies require a method to differentiate biological variants from process errors with high sensitivity and specificity. To address this challenge, we introduce the V-Phaser algorithm, which distinguished the covariation of biological variants from that of process errors. We validate the method by measuring how frequently it correctly identifies variants and errors on actual read sets with known variation. Further, using data derived from a patient following four years of HIV-1 infection, we show that V-Phaser can detect biological variants at frequencies comparable to approaches that require prior knowledge. V-Phaser is available for download at: http://www.broadinstitute.org/scientific-community/software.Keywords
This publication has 28 references indexed in Scilit:
- Whole Genome Deep Sequencing of HIV-1 Reveals the Impact of Early Minor Variants Upon Immune Recognition During Acute InfectionPLoS Pathogens, 2012
- Viral Population Estimation Using PyrosequencingPLoS Computational Biology, 2008
- Deciphering Human Immunodeficiency Virus Type 1 Transmission and Early Envelope Diversification by Single-Genome Amplification and SequencingJournal of Virology, 2008
- Characterization of mutation spectra with ultra-deep pyrosequencing: Application to HIV-1 drug resistanceGenome Research, 2007
- DNA bar coding and pyrosequencing to identify rare HIV drug resistance mutationsNucleic Acids Research, 2007
- Genome sequencing in microfabricated high-density picolitre reactorsNature, 2005
- Multiple, Linked Human Immunodeficiency Virus Type 1 Drug Resistance Mutations in Treatment-Experienced Patients Are Missed by Standard Genotype AnalysisJournal of Clinical Microbiology, 2005
- An SNP map of the human genome generated by reduced representation shotgun sequencingNature, 2000
- Comparative Performance of High-Density Oligonucleotide Sequencing and Dideoxynucleotide Sequencing of HIV Type 1polfrom Clinical SamplesAIDS Research and Human Retroviruses, 1998
- Genotypic and Phenotypic Characterization of HIV-1 Patients with Primary InfectionScience, 1993