Single-nucleotide conservation state annotation of the SARS-CoV-2 genome
Open Access
- 3 June 2021
- journal article
- research article
- Published by Springer Science and Business Media LLC in Communications Biology
- Vol. 4 (1), 1-11
- https://doi.org/10.1038/s42003-021-02231-w
Abstract
Given the global impact and severity of COVID-19, there is a pressing need for a better understanding of the SARS-CoV-2 genome and mutations. Multi-strain sequence alignments of coronaviruses (CoV) provide important information for interpreting the genome and its variation. We apply a comparative genomics method, ConsHMM, to the multi-strain alignments of CoV to annotate every base of the SARS-CoV-2 genome with conservation states based on sequence alignment patterns among CoV. The learned conservation states show distinct enrichment patterns for genes, protein domains, and other regions of interest. Certain states are strongly enriched or depleted of SARS-CoV-2 mutations, which can be used to predict potentially consequential mutations. We expect the conservation states to be a resource for interpreting the SARS-CoV-2 genome and mutations.Funding Information
- UC | UCLA | Eli and Edythe Broad Center of Regenerative Medicine and Stem Cell Research, University of California Los Angeles
- U.S. Department of Health & Human Services | National Institutes of Health (DP1DA044371)
- National Science Foundation (2125664)
This publication has 35 references indexed in Scilit:
- Partitioning heritability by functional annotation using genome-wide association summary statisticsNature Genetics, 2015
- cocor: A Comprehensive Solution for the Statistical Comparison of CorrelationsPLOS ONE, 2015
- Genomic and Network Patterns of Schizophrenia Genetic Variation in Human Evolutionary Accelerated RegionsMolecular Biology and Evolution, 2015
- ChromHMM: automating chromatin-state discovery and characterizationNature Methods, 2012
- PHAST and RPHAST: phylogenetic analysis with space/time modelsBriefings in Bioinformatics, 2010
- Detection of nonneutral substitution rates on mammalian phylogeniesGenome Research, 2009
- Biopython: freely available Python tools for computational molecular biology and bioinformaticsBioinformatics, 2009
- Toward using confidence intervals to compare correlations.Psychological Methods, 2007
- Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomesGenome Research, 2005
- Aligning Multiple Genomic Sequences With the Threaded Blockset AlignerGenome Research, 2004