ProteoClade: A taxonomic toolkit for multi-species and metaproteomic analysis
Open Access
- 29 February 2020
- journal article
- research article
- Published by Public Library of Science (PLoS) in PLoS Computational Biology
- Vol. 16 (3), e1007741
- https://doi.org/10.1371/journal.pcbi.1007741
Abstract
We present ProteoClade, a Python toolkit that performs taxa-specific peptide assignment, protein inference, and quantitation for multi-species proteomics experiments. ProteoClade scales to hundreds of millions of protein sequences, requires minimal computational resources, and is open source, multi-platform, and accessible to non-programmers. We demonstrate its utility for processing quantitative proteomic data derived from patient-derived xenografts and its speed and scalability enable a novel de novo proteomic workflow for complex microbiota samples. Author summary The exponential growth of the number of available reference protein sequences has provided an opportunity to taxonomically annotate and quantify complex mixtures of organisms using bottom-up proteomics. However, the ability to annotate relevant taxa to proteomics data is computationally challenging when data sets generate millions of candidate sequences and the reference database contains billions of peptide sequences. Here, we provide a software tool that enables users to perform taxon-specific quantitation on large proteomic data sets without requiring high performance computing. This tool flexibly enables users to match the reference database settings to their experimental conditions, and can scale from two-organism studies to the entire UniProt repository. In addition, we provide a de novo analysis workflow that enables the identification of organisms in the sample without prior specification, analogous to 16S rRNA sequencing.Funding Information
- National Institutes of Health (T32 GM007067-41)
- National Institutes of Health (R01 CA200893)
- National Institutes of Health (R21 CA138308)
- National Institutes of Health (R21 CA179452)
This publication has 17 references indexed in Scilit:
- Unipept 4.0: Functional Analysis of Metaproteome DataJournal of Proteome Research, 2018
- UniProt: a worldwide hub of protein knowledgeNucleic Acids Research, 2018
- Mass Spectrometry-Based Proteomics Reveals Potential Roles of NEK9 and MAP2K4 in Resistance to PI3K Inhibition in Triple-Negative Breast CancersCancer Research, 2018
- Proteomic and Metaproteomic Approaches to Understand Host–Microbe InteractionsAnalytical Chemistry, 2017
- Deep Metaproteomics Approach for the Study of Human MicrobiomesAnalytical Chemistry, 2017
- Breast tumors educate the proteome of stromal tissue in an individualized but coordinated mannerScience Signaling, 2017
- Ultra-deep and quantitative saliva proteome reveals dynamics of the oral microbiomeGenome Medicine, 2016
- Unipept web services for metaproteomics analysisBioinformatics, 2016
- The Unipept metaproteomics analysis pipelineProteomics, 2014
- Evaluating the Impact of Different Sequence Databases on Metaproteome Analysis: Insights from a Lab-Assembled Microbial MixturePLOS ONE, 2013