Patterns of Evolution and Host Gene Mimicry in Influenza and Other RNA Viruses

Abstract
It is well known that the dinucleotide CpG is under-represented in the genomic DNA of many vertebrates. This is commonly thought to be due to the methylation of cytosine residues in this dinucleotide and the corresponding high rate of deamination of 5-methycytosine, which lowers the frequency of this dinucleotide in DNA. Surprisingly, many single-stranded RNA viruses that replicate in these vertebrate hosts also have a very low presence of CpG dinucleotides in their genomes. Viruses are obligate intracellular parasites and the evolution of a virus is inexorably linked to the nature and fate of its host. One therefore expects that virus and host genomes should have common features. In this work, we compare evolutionary patterns in the genomes of ssRNA viruses and their hosts. In particular, we have analyzed dinucleotide patterns and found that the same patterns are pervasively over- or under-represented in many RNA viruses and their hosts suggesting that many RNA viruses evolve by mimicking some of the features of their host's genes (DNA) and likely also their corresponding mRNAs. When a virus crosses a species barrier into a different host, the pressure to replicate, survive and adapt, leaves a footprint in dinucleotide frequencies. For instance, since human genes seem to be under higher pressure to eliminate CpG dinucleotide motifs than avian genes, this pressure might be reflected in the genomes of human viruses (DNA and RNA viruses) when compared to those of the same viruses replicating in avian hosts. To test this idea we have analyzed the evolution of the influenza virus since 1918. We find that the influenza A virus, which originated from an avian reservoir and has been replicating in humans over many generations, evolves in a direction strongly selected to reduce the frequency of CpG dinucleotides in its genome. Consistent with this observation, we find that the influenza B virus, which has spent much more time in the human population, has adapted to its human host and exhibits an extremely low CpG dinucleotide content. We believe that these observations directly show that the evolution of RNA viral genomes can be shaped by pressures observed in the host genome. As a possible explanation, we suggest that the strong selection pressures acting on these RNA viruses are most likely related to the innate immune response and to nucleotide motifs in the host DNA and RNAs. Viruses are obligate intracellular parasites that use different strategies to sequester host cell machinery and avoid the host immune system. In this paper we explore the genomes of viruses that encode their genetic information in single-stranded RNA, a different material than the one used by their hosts (double-stranded DNA). It is interesting to observe that these viruses share some of the host's characteristics. For instance, one of the most underrepresented motifs in the DNA of vertebrates is the dinucleotide CpG. This is commonly thought to be due to methylation and deamination of cytosine residues in this dinucleotide. Surprisingly, the same CpG suppression is observed in vertebrate RNA viruses but not in RNA phages. We show that RNA viruses present similar dinucleotide pressures as their host genes. We find that the influenza A virus, which originated from an avian reservoir and replicated in humans over many generations, evolves to reduce the frequency of CpG dinucleotides mimicking the human genes. Influenza B, which has been in humans longer, exhibits an extremely low CpG dinucleotide content. These observations suggest that the evolution of RNA viruses is shaped by pressures observed in the host genome.