A two‐step database search method improves sensitivity in peptide sequence matches for metaproteomics and proteogenomics studies
- 15 February 2013
- journal article
- research article
- Published by Wiley in Proteomics
- Vol. 13 (8), 1352-1357
- https://doi.org/10.1002/pmic.201200352
Abstract
Large databases (>106 sequences) used in metaproteomic and proteogenomic studies present challenges in matching peptide sequences to MS/MS data using database‐search programs. Most notably, strict filtering to avoid false‐positive matches leads to more false negatives, thus constraining the number of peptide matches. To address this challenge, we developed a two‐step method wherein matches derived from a primary search against a large database were used to create a smaller subset database. The second search was performed against a target‐decoy version of this subset database merged with a host database. High confidence peptide sequence matches were then used to infer protein identities. Applying our two‐step method for both metaproteomic and proteogenomic analysis resulted in twice the number of high confidence peptide sequence matches in each case, as compared to the conventional one‐step method. The two‐step method captured almost all of the same peptides matched by the one‐step method, with a majority of the additional matches being false negatives from the one‐step method. Furthermore, the two‐step method improved results regardless of the database search program used. Our results show that our two‐step method maximizes the peptide matching sensitivity for applications requiring large databases, especially valuable for proteogenomics and metaproteomics studies.Keywords
Funding Information
- National Institutes of Health (1R01 DE17734)
- NSF (1147079)
This publication has 15 references indexed in Scilit:
- Workflow for analysis of high mass accuracy salivary data set using MaxQuant and ProteinPilot search algorithmProteomics, 2012
- Deep metaproteomic analysis of human salivary supernatantProteomics, 2012
- Exploring mixed microbial community functioning: recent advances in metaproteomicsFEMS Microbiology Ecology, 2012
- Strategies for Metagenomic-Guided Whole-Community Proteomics of Complex Microbial EnvironmentsPLOS ONE, 2011
- Evaluating the potential of a novel oral lesion exudate collection method coupled with mass spectrometry-based proteomics for oral cancer biomarker discoveryClinical Proteomics, 2011
- An iterative workflow for mining the human intestinal metaproteomeBMC Genomics, 2011
- ProteogenomicsProteomics, 2010
- The complete peptide dictionary – A meta‐proteomics resourceProteomics, 2010
- Nonlinear Fitting Method for Determining Local False Discovery Rates from Decoy Database SearchesJournal of Proteome Research, 2008
- The Paragon Algorithm, a Next Generation Search Engine That Uses Sequence Temperature Values and Feature Probabilities to Identify Peptides from Tandem Mass SpectraMolecular & Cellular Proteomics, 2007