A two‐step database search method improves sensitivity in peptide sequence matches for metaproteomics and proteogenomics studies

15 February 2013

journal article
research article
Published by Wiley in Proteomics

Vol. 13 (8), 1352-1357
https://doi.org/10.1002/pmic.201200352

Abstract

Large databases (>10⁶ sequences) used in metaproteomic and proteogenomic studies present challenges in matching peptide sequences to MS/MS data using database‐search programs. Most notably, strict filtering to avoid false‐positive matches leads to more false negatives, thus constraining the number of peptide matches. To address this challenge, we developed a two‐step method wherein matches derived from a primary search against a large database were used to create a smaller subset database. The second search was performed against a target‐decoy version of this subset database merged with a host database. High confidence peptide sequence matches were then used to infer protein identities. Applying our two‐step method for both metaproteomic and proteogenomic analysis resulted in twice the number of high confidence peptide sequence matches in each case, as compared to the conventional one‐step method. The two‐step method captured almost all of the same peptides matched by the one‐step method, with a majority of the additional matches being false negatives from the one‐step method. Furthermore, the two‐step method improved results regardless of the database search program used. Our results show that our two‐step method maximizes the peptide matching sensitivity for applications requiring large databases, especially valuable for proteogenomics and metaproteomics studies.

Keywords

Funding Information

National Institutes of Health (1R01 DE17734)
NSF (1147079)

This publication has 15 references indexed in Scilit:

Workflow for analysis of high mass accuracy salivary data set using MaxQuant and ProteinPilot search algorithm
Proteomics, 2012
Deep metaproteomic analysis of human salivary supernatant
Proteomics, 2012
Exploring mixed microbial community functioning: recent advances in metaproteomics
FEMS Microbiology Ecology, 2012
Strategies for Metagenomic-Guided Whole-Community Proteomics of Complex Microbial Environments
PLOS ONE, 2011
Evaluating the potential of a novel oral lesion exudate collection method coupled with mass spectrometry-based proteomics for oral cancer biomarker discovery
Clinical Proteomics, 2011
An iterative workflow for mining the human intestinal metaproteome
BMC Genomics, 2011
Proteogenomics
Proteomics, 2010
The complete peptide dictionary – A meta‐proteomics resource
Proteomics, 2010
Nonlinear Fitting Method for Determining Local False Discovery Rates from Decoy Database Searches
Journal of Proteome Research, 2008
The Paragon Algorithm, a Next Generation Search Engine That Uses Sequence Temperature Values and Feature Probabilities to Identify Peptides from Tandem Mass Spectra
Molecular & Cellular Proteomics, 2007

Cited by 185 articles