Intrahost SARS-CoV-2 k-mer Identification Method (iSKIM) for Rapid Detection of Mutations of Concern Reveals Emergence of Global Mutation Patterns
Open Access
- 27 September 2022
- Vol. 14 (10), 2128
- https://doi.org/10.3390/v14102128
Abstract
Despite unprecedented global sequencing and surveillance of SARS-CoV-2, timely identification of the emergence and spread of novel variants of concern (VoCs) remains a challenge. Several million raw genome sequencing runs are now publicly available. We sought to survey these datasets for intrahost variation to study emerging mutations of concern. We developed iSKIM (“intrahost SARS-CoV-2 k-mer identification method”) to relatively quickly and efficiently screen the many SARS-CoV-2 datasets to identify intrahost mutations belonging to lineages of concern. Certain mutations surged in frequency as intrahost minor variants just prior to, or while lineages of concern arose. The Spike N501Y change common to several VoCs was found as a minor variant in 834 samples as early as October 2020. This coincides with the timing of the first detected samples with this mutation in the Alpha/B.1.1.7 and Beta/B.1.351 lineages. Using iSKIM, we also found that Spike L452R was detected as an intrahost minor variant as early as September 2020, prior to the observed rise of the Epsilon/B.1.429/B.1.427 lineages in late 2020. iSKIM rapidly screens for mutations of interest in raw data, prior to genome assembly, and can be used to detect increases in intrahost variants, potentially providing an early indication of novel variant spread.Keywords
Funding Information
- by Department of Defense Global Emerging Infections Surveillance (GEIS) section of the Armed Forces Health Surveillance Division (ProMIS IDs P0169_21_WR and P0130_22_WR)
This publication has 72 references indexed in Scilit:
- Virus Pathogen Database and Analysis Resource (ViPR): A Comprehensive Bioinformatics Database and Analysis Resource for the Coronavirus Research CommunityViruses, 2012
- LoFreq: a sequence-quality aware, ultra-sensitive variant caller for uncovering cell-population heterogeneity from high-throughput sequencing datasetsNucleic Acids Research, 2012
- Viral Quasispecies EvolutionMicrobiology and Molecular Biology Reviews, 2012
- Integrative Genomics Viewer (IGV): high-performance genomics data visualization and explorationBriefings in Bioinformatics, 2012
- The sequence read archive: explosive growth of sequencing dataNucleic Acids Research, 2011
- Efficient counting of k-mers in DNA sequences using a bloom filterBMC Bioinformatics, 2011
- A fast, lock-free approach for efficient parallel counting of occurrences of k-mersBioinformatics, 2011
- The Sequence Read ArchiveNucleic Acids Research, 2010
- FastTree 2 – Approximately Maximum-Likelihood Trees for Large AlignmentsPLOS ONE, 2010
- BEDTools: a flexible suite of utilities for comparing genomic featuresBioinformatics, 2010