Coevolution of DNA Uptake Sequences and Bacterial Proteomes

Abstract
Dramatic examples of repeated sequences occur in the genomes of some naturally competent bacteria, which contain hundreds or thousands of copies of short motifs called DNA uptake signal sequences. Here, we analyze the evolutionary interactions between coding-region uptake sequences and the proteomes of Haemophilus influenzae, Actinobacillus pleuropneumoniae, and Neisseria meningitidis. In all three genomes, uptake sequence accumulation in coding sequences has approximately doubled the frequencies of those tripeptides specified by each species’ uptake sequence. The presence of uptake sequences in particular reading frames correlated most strongly with the use of preferred codons at degenerately coded positions, but the density of uptake sequences correlated only poorly with protein functional category. Genes lacking homologs in related genomes also lacked uptake sequences, strengthening the evidence that uptake sequences do not drive lateral gene transfer between distant relatives but instead accumulate after genes have been transferred. Comparison of the uptake sequence-encoded peptides of H. influenzae and N. meningitidis proteins with their homologs from related bacteria without uptake sequences indicated that uptake sequences were also preferentially located in poorly conserved genes and at poorly conserved amino acids. With few exceptions, amino acids at positions encoded by uptake sequences were as well conserved as other amino acids, suggesting that extant uptake sequences impose little or no constraint on coding for protein function. However, this state is likely to be achieved at a substantial cost because of the selective deaths required to eliminate maladaptive mutations that improve uptake sequences.