Rampant C-to-U deamination accounts for the intrinsically high mutation rate in SARS-CoV-2 spike gene

Abstract
The high mutation rate of SARS-CoV-2 largely complicates our control of the pandemic. Particularly, it is currently unclear why the spike (S) gene has extraordinarily high mutation rate among all SARS-CoV-2 genes. By analyzing the occurrence of fixed synonymous mutations between SARS-CoV-2 and RaTG13, and profiling the DAF (derived allele frequency) of polymorphic synonymous sites among millions of world-wide SARS-CoV-2 strains, we found that both fixed and polymorphic mutations show higher mutation rates in S gene than other genes. The majority of mutation is C-to-T, representing the APOBEC-mediated C-to-U deamination instead of the previously-proposed A-to-I deamination. Both in silico and in vivo evidences indicated that S gene is more likely to be single-stranded compared to other SARS-CoV-2 genes, agreeing with the APOBEC preference on ssRNA. We conclude that the single-stranded property of S gene makes itself a favorable target for C-to-U deamination, leading to its excessively high mutation rate compared to other non-S genes. In conclusion, APOBEC, rather than ADAR, is the “editor-in-chief” of SARS-CoV-2 RNAs. This work helps us understand the molecular mechanism underlying the mutation and evolution of SARS-CoV-2, and is believed to contribute to the control of the pandemic.