Focus on the spectra that matter by clustering of quantification data in shotgun proteomics

Open Access

26 June 2020

journal article
research article
Published by Springer Science and Business Media LLC in Nature Communications

Vol. 11 (1), 1-12
https://doi.org/10.1038/s41467-020-17037-3

Abstract

In shotgun proteomics, the analysis of label-free quantification experiments is typically limited by the identification rate and the noise level in the quantitative data. This generally causes a low sensitivity in differential expression analysis. Here, we propose a quantification-first approach for peptides that reverses the classical identification-first workflow, thereby preventing valuable information from being discarded in the identification stage. Specifically, we introduce a method, Quandenser, that applies unsupervised clustering on both MS1 and MS2 level to summarize all analytes of interest without assigning identities. This reduces search time due to the data reduction. We can now employ open modification and de novo searches to identify analytes of interest that would have gone unnoticed in traditional pipelines. Quandenser+Triqler outperforms the state-of-the-art method MaxQuant+Perseus, consistently reporting more differentially abundant proteins for all tested datasets. Software is available for all major operating systems at https://github.com/statisticalbiotechnology/quandenser, under Apache 2.0 license. Matching mass spectra to peptide sequences is the usual first step in proteomics data analysis, often followed by peptide quantification. Here, the authors show that clustering and quantifying mass spectral features prior to peptide identification can increase the sensitivity of label-free quantitative proteomics.

Funding Information

Vetenskapsrådet (2017-04030)

This publication has 49 references indexed in Scilit:

PRIDE Cluster: building a consensus of proteomics data
Nature Methods, 2013
Fast Multi-blind Modification Search through Tandem Mass Spectrometry
Molecular & Cellular Proteomics, 2012
Faster SEQUEST Searching for Peptide Identification from Tandem Mass Spectra
Journal of Proteome Research, 2011
Spectral archives: extending spectral libraries to analyze both identified and unidentified spectra
Nature Methods, 2011
More than 100,000 Detectable Peptide Species Elute in Single Shotgun Proteomics Runs but the Majority is Inaccessible to Data-Dependent LC−MS/MS
Journal of Proteome Research, 2011
Novel Oxidative Modifications in Redox-Active Cysteine Residues
Molecular & Cellular Proteomics, 2011
A statistical framework for protein quantitation in bottom-up MS-based proteomics
Bioinformatics, 2009
Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources
Nature Protocols, 2008
ProteoWizard: open source software for rapid proteomics tools development
Bioinformatics, 2008
Significance analysis of microarrays applied to the ionizing radiation response
Proceedings of the National Academy of Sciences of the United States of America, 2001

Cited by 19 articles