Large-Scale Identification of N-Terminal Peptides in the Halophilic Archaea Halobacterium salinarum and Natronomonas pharaonis

Abstract
Characterization of protein N-terminal peptides supports the quality assessment of data derived from genomic sequences (e.g., the correct assignment of start codons) and hints to in vivo N-terminal modifications such as N-terminal acetylation and removal of the initiator methionine. The current work represents the first large-scale identification of N-terminal peptides from prokaryotes, of the two halophilic euryarchaeota Halobacterium salinarum and Natronomonas pharaonis. Two methods were used that specifically allow the characterization of protein N-terminal peptides: combined fractional diagonal chromatography (COFRADIC) and strong cation exchange chromatography (SCX), both known to enrich for N-terminally blocked peptides. In addition to these specific methods, N-terminal peptide identifications were extracted from our previous genome-wide proteomic data. Combining all data, 606 N-terminal peptides from Hbt. salinarum and 328 from Nmn. pharaonis were reliably identified. These results constitute the largest available dataset holding identified and characterized protein N-termini for prokaryotes (archaea and bacteria). They allowed the validation/improvement of start codon assignments as automatic gene finders tend to misassign start codons for GC-rich genomes. In addition, the dataset allowed unravelling N-terminal protein maturation in archaea, showing that 60% of the proteins undergo methionine cleavage and thatin contrast to current knowledgeNα-acetylation is common in the archaeal domain of life with 13−18% of the proteins being Nα-acetylated. The protein sets described in this paper are available by FTP and might be used as reference sets to test the performance of new gene finders. Keywords: Halobacterium salinarumNatronomonas pharaonis • archaea • halophilic • SCX • ESI Q-TOF • LC−MS/MS • N-terminal peptide • COFRADIC • gene finder