Proteomic signatures: Amino acid and oligopeptide compositions differentiate among phyla

Abstract
Availability of complete genome sequences allows in‐depth comparison of single‐residue and oligopeptide compositions of the corresponding proteomes. We have used principal component analysis (PCA) to study the landscape of compositional motifs across more than 70 genera from all three superkingdoms. Unexpectedly, the first two principal components clearly differentiate archaea, eubacteria, and eukaryota from each other. In particular, we contrast compositional patterns typical of the three superkingdoms and characterize differences between species and phyla, as well as among patterns shared by all compositional proteomic signatures. These species‐specific patterns may even extend to subsets of the entire proteome, such as proteins pertaining to individual yeast chromosomes. We identify factors that affect compositional signatures, such as living habitat, and detect strong eukaryotic preference for homopeptides and palindromic tripeptides. We further detect oligopeptides that are either universally over‐ or underabundant across the whole proteomic landscape, as well as oligopeptides whose over‐ or underabundance is phylum‐ or species‐specific. Finally, we report that species composition signatures preserve evolutionary memory, providing a new method to compare phylogenetic relationships among species that avoids problems of sequence alignment and ortholog detection. Proteins 2004.