A New Tag Index Scheme Enables Fast Peptide Retrieval for Protein Identification

Open Access

1 January 2022

journal article
research article
Published by Scientific Research Publishing, Inc. in Journal of Computer and Communications

Vol. 10 (04), 14-23
https://doi.org/10.4236/jcc.2022.104002

Abstract

Sequence tag index in the field of computational proteomics can be used to facilitate faster open-search-based identification of modified peptides and in-depth analysis of mass spectrometry data. In protein-identification search engines, sequence tag index are playing a prominent role in recent ten years due to fast searching speed. However, in pursuit of less index space consumption, some protein search engines design excessively concise index schemes which lead to higher computational burden. We proposed a new tag index scheme named TIIP with a better balance between space and time complexity. TIIP has a unique two-level hierarchical index structure which allows rapid retrieval of all peptide sequences and their corresponding masses. Theoretically, the index space consumption of TIIP is not much higher compared to the typical tag index schemes, but the time complexity of sequence retrieval can be reduced to O(1), and practically, TIIP has about one million fold improvement in searching speed compared with brute force approach.

Keywords

This publication has 21 references indexed in Scilit:

Fast Multi-blind Modification Search through Tandem Mass Spectrometry
Molecular & Cellular Proteomics, 2012
Speeding up tandem mass spectrometry-based database searching by longest common prefix
BMC Bioinformatics, 2010
pFind 2.0: a software package for peptide and protein identification via tandem mass spectrometry
Rapid Communications in Mass Spectrometry, 2007
InsPecT: Identification of Posttranslationally Modified Peptides from Tandem Mass Spectra
Analytical Chemistry, 2005
pFind: a novel database-searching software system for automated peptide and protein identification via tandem mass spectrometry
Bioinformatics, 2005
Exploiting the kernel trick to correlate fragment ions for peptide identification via tandem mass spectrometry
Bioinformatics, 2004
TANDEM: matching proteins with tandem mass spectra
Bioinformatics, 2004
GutenTag: High-Throughput Sequence Tagging via an Empirically Derived Fragmentation Model
Analytical Chemistry, 2003
Mass spectrometry-based proteomics
Nature, 2003
Error-Tolerant Identification of Peptides in Sequence Databases by Peptide Sequence Tags
Analytical Chemistry, 1994