Integrated Approach for Manual Evaluation of Peptides Identified by Searching Protein Sequence Databases with Tandem Mass Spectra
- 24 May 2005
- journal article
- research article
- Published by American Chemical Society (ACS) in Journal of Proteome Research
- Vol. 4 (3), 998-1005
- https://doi.org/10.1021/pr049754t
Abstract
Quantitative proteomics relies on accurate protein identification, which often is carried out by automated searching of a sequence database with tandem mass spectra of peptides. When these spectra contain limited information, automated searches may lead to incorrect peptide identifications. It is therefore necessary to validate the identifications by careful manual inspection of the mass spectra. Not only is this task time-consuming, but the reliability of the validation varies with the experience of the analyst. Here, we report a systematic approach to evaluating peptide identifications made by automated search algorithms. The method is based on the principle that the candidate peptide sequence should adequately explain the observed fragment ions. Also, the mass errors of neighboring fragments should be similar. To evaluate our method, we studied tandem mass spectra obtained from tryptic digests of E. coli and HeLa cells. Candidate peptides were identified with the automated search engine Mascot and subjected to the manual validation method. The method found correct peptide identifications that were given low Mascot scores (e.g., 20−25) and incorrect peptide identifications that were given high Mascot scores (e.g., 40−50). The method comprehensively detected false results from searches designed to produce incorrect identifications. Comparison of the tandem mass spectra of synthetic candidate peptides to the spectra obtained from the complex peptide mixtures confirmed the accuracy of the evaluation method. Thus, the evaluation approach described here could help boost the accuracy of protein identification, increase number of peptides identified, and provide a step toward developing a more accurate next-generation algorithm for protein identification. Keywords: protein identification • manual evaluation • automated database searchKeywords
This publication has 25 references indexed in Scilit:
- Construction of a two‐dimensional gel electrophoresis protein database for the Nicotiana tabacum cv. Bright Yellow‐2 cell suspension cultureProteomics, 2004
- Intensity-based protein identification by machine learning from a library of tandem mass spectraNature Biotechnology, 2004
- A method for reducing the time required to match protein sequences with tandem mass spectraRapid Communications in Mass Spectrometry, 2003
- A proteomic study of the arabidopsis nuclear matrixJournal of Cellular Biochemistry, 2003
- Popitam: Towards new heuristic strategies to improve protein identification from tandem mass spectrometry dataProteomics, 2003
- Cleavage N-Terminal to Proline: Analysis of a Database of Peptide Tandem Mass SpectraAnalytical Chemistry, 2003
- Empirical Statistical Model To Estimate the Accuracy of Peptide Identifications Made by MS/MS and Database SearchAnalytical Chemistry, 2002
- Charting the Proteomes of Organisms with Unsequenced Genomes by MALDI-Quadrupole Time-of-Flight Mass Spectrometry and BLAST Homology SearchingAnalytical Chemistry, 2001
- Probability-based protein identification by searching sequence databases using mass spectrometry dataElectrophoresis, 1999
- Error-Tolerant Identification of Peptides in Sequence Databases by Peptide Sequence TagsAnalytical Chemistry, 1994