Informatics Strategies for Large-Scale Novel Cross-Linking Analysis

Abstract
The detection of protein interactions in biological systems represents a significant challenge for today's technology. Chemical cross-linking provides the potential to impart new chemical bonds in a complex system that result in mass changes in a set of tryptic peptides detected by mass spectrometry. However, system complexity and cross-linking product heterogeneity have precluded widespread chemical cross-linking use for large-scale identification of protein−protein interactions. The development of mass spectrometry identifiable cross-linkers called protein interaction reporters (PIRs) has enabled on-cell chemical cross-linking experiments with product type differentiation. However, the complex datasets resultant from PIR experiments demand new informatics capabilities to allow interpretation. This manuscript details our efforts to develop such capabilities and describes the program X-links, which allows PIR product type differentiation. Furthermore, we also present the results from Monte Carlo simulation of PIR-type experiments to provide false discovery rate estimates for the PIR product type identification through observed precursor and released peptide masses. Our simulations also provide peptide identification calculations based on accurate masses and database complexity that can provide an estimation of false discovery rates for peptide identification. Overall, the calculations show a low rate of false discovery of PIR product types due to random mass matching of approximately 12% with 10 ppm mass measurement accuracy and spectral complexity resulting from 100 peptides. In addition, consideration of a reduced database resulting from stage 1 analysis of Shewanella oneidensis MR-1 containing 367 proteins resulted in a significant reduction of expected identification false discovery rate estimation compared to that from the entire Shewanella oneidensis MR-1 proteome.

This publication has 30 references indexed in Scilit: