Viral adaptation to host: a proteome‐based analysis of codon usage and amino acid preferences

Abstract
Viruses differ markedly in their specificity toward host organisms. Here, we test the level of general sequence adaptation that viruses display toward their hosts. We compiled a representative data set of viruses that infect hosts ranging from bacteria to humans. We consider their respective amino acid and codon usages and compare them among the viruses and their hosts. We show that bacteria‐infecting viruses are strongly adapted to their specific hosts, but that they differ from other unrelated bacterial hosts. Viruses that infect humans, but not those that infect other mammals or aves, show a strong resemblance to most mammalian and avian hosts, in terms of both amino acid and codon preferences. In groups of viruses that infect humans or other mammals, the highest observed level of adaptation of viral proteins to host codon usages is for those proteins that appear abundantly in the virion. In contrast, proteins that are known to participate in host‐specific recognition do not necessarily adapt to their respective hosts. The implication for the potential of viral infectivity is discussed. ### Synopsis Viruses are autonomous entities with an extremely fast evolution rate. They invade their host and replicate to produce new viral particles. These processes take place only inside their hosts’ cellular environment. To activate their reproductive cycle, viruses typically have to override their hosts’ translational machinery and in addition they must evade the hosts’ immune system and additional defense mechanism. These basic observations make it very interesting to investigate the evolutionary interactions among hosts and their infecting viruses. There are several critical parameters that determine the selectivity with which viruses infect their hosts. These include the number of viruses that are produced in each infected cell, the host's population size, and its generation time. In addition, there is the degree of the virus stability in the hostile environment outside the cell and, most importantly, the molecular specificity of recognition that underlies the virus entry into the host. Studies of the evolutionary history of viral adaptation suggest the existence of a rich web of interactions that involve both the host and virus codon usage, the virus replication mode, genome size, and the variety of its potential hosts. It was also proposed that the extremely high mutation rates in viruses (especially RNA viruses) outpace the evolutionary processes of selection that drive codon preference optimization of viruses and their cognate hosts. For certain viruses, genome‐wide mutational pressures override the selection for specific codons. In this study, we took advantage of the fast growth in sequencing data for many model organisms as well as for thousands of viral genomes. Such advances have made it possible for us to compile a balanced data set for further analysis. This set includes ∼300 representative viruses whose hosts range from humans to bacteria, and whose genome had been completely sequenced. We had to overcome the difficulty that arises from the fact that although certain viruses infect a broad range of species, others infect only a single host. We solved this problem by developing a consistent virus‐to‐host mapping. Our main objective was to answer the following question: notwithstanding the enormous diversity among viruses, is there an overall well‐defined and measurable molecular similarity between viruses and their hosts? Such similarity, should one exist, can presumably be considered as a manifestation of some molecular adaptation mechanisms. We develop a statistical framework for the purpose of providing an unbiased assessment of the mutual distances between all viruses and all recognized hosts. To test the hypothesis of a molecular adaptation of viruses toward their hosts, we focus on the codon usage and on the amino acid preferences within groups of viruses that are grouped at varying taxonomical granularities. We observe that all bacteriophages are strongly tuned to match their unique bacterial hosts and this correspondence is also evident in their GC genomic contents. However, somewhat surprisingly, viruses that infect humans resemble not only the human codon preference and amino acids frequency but also an additional 10 mammalian hosts equally. This similarity even extends to aves and several insects. This observation does not hold for viruses that infect other mammals, despite a strong similarity among the codon usages among most mammals. Finally, we show that viral selection of codon usage toward that of the host has not occurred uniformly for all proteins of the virus, but it is mainly dominated by the set of proteins expressed in high abundance. The implications of these observations for viral evolution and on the potential for zoonotic epidemics are evident. It is likely that the domestication and the close interaction between humans, rats, and farm animals for thousands of years has led to the evolution of viruses that infect humans and are adapted toward a broad range of hosts. During the last century of human evolution, with the growth in human population and global traffic, we witness instances of viruses that crossed the host barrier and were introduced into the human population. Known examples are the HIV virus in the early 1980s, the SARS in 2003, and the latest epidemic of the H1N1 swine flu in 2009. The similarities in codon usage and amino acid composition that we have observed in this work can somewhat relate to the potential for zoonosis. Although these molecular properties are neither necessary nor sufficient conditions for host shifts, our analysis can nevertheless contribute to a framework that would, on the one hand, permit analysis of the potential of certain viruses to adapt to new host species and, on the other, allow the development of attenuated viruses for vaccination. Mol Syst Biol. 5: 311