Speeding up tandem mass spectrometry-based database searching by longest common prefix

Open Access

25 November 2010

journal article
Published by Springer Science and Business Media LLC in BMC Bioinformatics

Vol. 11 (1), 577
https://doi.org/10.1186/1471-2105-11-577

Abstract

Background Tandem mass spectrometry-based database searching has become an important technology for peptide and protein identification. One of the key challenges in database searching is the remarkable increase in computational demand, brought about by the expansion of protein databases, semi- or non-specific enzymatic digestion, post-translational modifications and other factors. Some software tools choose peptide indexing to accelerate processing. However, peptide indexing requires a large amount of time and space for construction, especially for the non-specific digestion. Additionally, it is not flexible to use. Results We developed an algorithm based on the longest common prefix (ABLCP) to efficiently organize a protein sequence database. The longest common prefix is a data structure that is always coupled to the suffix array. It eliminates redundant candidate peptides in databases and reduces the corresponding peptide-spectrum matching times, thereby decreasing the identification time. This algorithm is based on the property of the longest common prefix. Even enzymatic digestion poses a challenge to this property, but some adjustments can be made to this algorithm to ensure that no candidate peptides are omitted. Compared with peptide indexing, ABLCP requires much less time and space for construction and is subject to fewer restrictions. Conclusions The ABLCP algorithm can help to improve data analysis efficiency. A software tool implementing this algorithm is available at http://pfind.ict.ac.cn/pfind2dot5/index.htm

Keywords

This publication has 24 references indexed in Scilit:

An efficient, versatile approach to suffix sorting
ACM Journal of Experimental Algorithmics, 2008
Rapid and Accurate Peptide Identification from Tandem Mass Spectra
Journal of Proteome Research, 2008
The Standard Protein Mix Database: A Diverse Data Set To Assist in the Production of Improved Peptide and Protein Identification Software Tools
Journal of Proteome Research, 2007
Large-scale phosphorylation analysis of mouse liver
Proceedings of the National Academy of Sciences of the United States of America, 2007
Novel peptide identification from tandem mass spectra using ESTs and sequence database compression
Molecular Systems Biology, 2007
Linear work suffix array construction
Journal of the ACM, 2006
The International Protein Index: An integrated database for proteomics experiments
Proteomics, 2004
Exploiting the kernel trick to correlate fragment ions for peptide identification via tandem mass spectrometry
Bioinformatics, 2004
Probability-based protein identification by searching sequence databases using mass spectrometry data
Electrophoresis, 1999
An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database
Journal of the American Society for Mass Spectrometry, 1994

Cited by 8 articles