Gain and Loss of Multiple Genes During the Evolution of Helicobacter pylori

Abstract
Sequence diversity and gene content distinguish most isolates of Helicobacter pylori. Even greater sequence differences differentiate distinct populations of H. pylori from different continents, but it was not clear whether these populations also differ in gene content. To address this question, we tested 56 globally representative strains of H. pylori and four strains of Helicobacter acinonychis with whole genome microarrays. Of the weighted average of 1,531 genes present in the two sequenced genomes, 25% are absent in at least one strain of H. pylori and 21% were absent or variable in H. acinonychis. We extrapolate that the core genome present in all isolates of H. pylori contains 1,111 genes. Variable genes tend to be small and possess unusual GC content; many of them have probably been imported by horizontal gene transfer. Phylogenetic trees based on the microarray data differ from those based on sequences of seven genes from the core genome. These discrepancies are due to homoplasies resulting from independent gene loss by deletion or recombination in multiple strains, which distort phylogenetic patterns. The patterns of these discrepancies versus population structure allow a reconstruction of the timing of the acquisition of variable genes within this species. Variable genes that are located within the cag pathogenicity island were apparently first acquired en bloc after speciation. In contrast, most other variable genes are of unknown function or encode restriction/modification enzymes, transposases, or outer membrane proteins. These seem to have been acquired prior to speciation of H. pylori and were subsequently lost by convergent evolution within individual strains. Thus, the use of microarrays can reveal patterns of gene gain or loss when examined within a phylogenetic context that is based on sequences of core genes. The Gram-negative pathogenic bacterium Helicobacter pylori colonizes the stomach of 50% of mankind and has probably infected humans since their origins. Due to geographic isolation and frequent local recombination, phylogeographic differences within H. pylori have arisen, resulting in multiple populations and subpopulations that mirror ancient human migrations and genetic diversity. We have examined the gene content of representatives of these populations by whole genome microarrays. Only 1,111 genes are predicted to exist in all H. pylori of the 1,531 that are present on average in two sequenced genomes. Missing genes fall into two classes: one class contains genes within the cag pathogenicity island that was acquired en bloc after speciation and is present only in particular populations. The second class contains a variety of genes whose function may be unimportant for the cell and that were acquired prior to speciation. Their absence in individual isolates reflects convergent evolution through gene loss. Thus, patterns of gene gain or loss can be identified by whole genome microarrays within a phylogenetic context that can be supplied by sequences of genes from the core genome.