Identification of Novel Missense Mutations in a Large Number of Recent SARS-CoV-2 Genome Sequences

Abstract
SARS-CoV-2 infection has spread to over 200 countries since it was first reported in December of 2019. Significant country-specific variations in infection and mortality rate have been noted. We performed a sequence analysis of 474 SARS-CoV-2 genomes submitted to GenBank up to April 11 and identified 5 recently emerged mutations in many the isolates (up to 40%). This finding was verified on a larger scale using the GISAID database with 8,008 SARS-CoV-2 sequences. Our analysis highlights 5 frequent new mutations that have emerged since late February 2020. These mutations are: one each missense (non-synonymous) mutation in orf1ab (C1059T), orf3 (G25563T) and orf8 (C27964T), one in 5’UTR (C241T), one in a non-coding region (G29553A). The final mutation (G29553A) was found to be almost exclusive to the US isolates. The first 3 mutations are non-synonymous, leading to amino acid substitutions in the viral protein sequence. Except for C241T, all the novel mutations identified are absent in the isolates from Italy and Spain. Although the clinical significance of these mutations is currently unclear, the findings lay the foundation for further study into the impact of SARS-CoV-2 mutations on disease incidence, severity, and host immune response. In addition, it may also provide insights into vaccine development and serological response detection for the virus.