ProteoClade: A taxonomic toolkit for multi-species and metaproteomic analysis

Open Access

29 February 2020

journal article
research article
Published by Public Library of Science (PLoS) in PLoS Computational Biology

Vol. 16 (3), e1007741
https://doi.org/10.1371/journal.pcbi.1007741

Abstract

We present ProteoClade, a Python toolkit that performs taxa-specific peptide assignment, protein inference, and quantitation for multi-species proteomics experiments. ProteoClade scales to hundreds of millions of protein sequences, requires minimal computational resources, and is open source, multi-platform, and accessible to non-programmers. We demonstrate its utility for processing quantitative proteomic data derived from patient-derived xenografts and its speed and scalability enable a novel de novo proteomic workflow for complex microbiota samples. Author summary The exponential growth of the number of available reference protein sequences has provided an opportunity to taxonomically annotate and quantify complex mixtures of organisms using bottom-up proteomics. However, the ability to annotate relevant taxa to proteomics data is computationally challenging when data sets generate millions of candidate sequences and the reference database contains billions of peptide sequences. Here, we provide a software tool that enables users to perform taxon-specific quantitation on large proteomic data sets without requiring high performance computing. This tool flexibly enables users to match the reference database settings to their experimental conditions, and can scale from two-organism studies to the entire UniProt repository. In addition, we provide a de novo analysis workflow that enables the identification of organisms in the sample without prior specification, analogous to 16S rRNA sequencing.

Funding Information

National Institutes of Health (T32 GM007067-41)
National Institutes of Health (R01 CA200893)
National Institutes of Health (R21 CA138308)
National Institutes of Health (R21 CA179452)

This publication has 17 references indexed in Scilit:

Unipept 4.0: Functional Analysis of Metaproteome Data
Journal of Proteome Research, 2018
UniProt: a worldwide hub of protein knowledge
Nucleic Acids Research, 2018
Mass Spectrometry-Based Proteomics Reveals Potential Roles of NEK9 and MAP2K4 in Resistance to PI3K Inhibition in Triple-Negative Breast Cancers
Cancer Research, 2018
Proteomic and Metaproteomic Approaches to Understand Host–Microbe Interactions
Analytical Chemistry, 2017
Deep Metaproteomics Approach for the Study of Human Microbiomes
Analytical Chemistry, 2017
Breast tumors educate the proteome of stromal tissue in an individualized but coordinated manner
Science Signaling, 2017
Ultra-deep and quantitative saliva proteome reveals dynamics of the oral microbiome
Genome Medicine, 2016
Unipept web services for metaproteomics analysis
Bioinformatics, 2016
The Unipept metaproteomics analysis pipeline
Proteomics, 2014
Evaluating the Impact of Different Sequence Databases on Metaproteome Analysis: Insights from a Lab-Assembled Microbial Mixture
PLOS ONE, 2013

Cited by 12 articles