GHOSTX: An Improved Sequence Homology Search Algorithm Using a Query Suffix Array and a Database Suffix Array
Open Access
- 6 August 2014
- journal article
- Published by Public Library of Science (PLoS) in PLOS ONE
- Vol. 9 (8), e103833
- https://doi.org/10.1371/journal.pone.0103833
Abstract
DNA sequences are translated into protein coding sequences and then further assigned to protein families in metagenomic analyses, because of the need for sensitivity. However, huge amounts of sequence data create the problem that even general homology search analyses using BLASTX become difficult in terms of computational cost. We designed a new homology search algorithm that finds seed sequences based on the suffix arrays of a query and a database, and have implemented it as GHOSTX. GHOSTX achieved approximately 131–165 times acceleration over a BLASTX search at similar levels of sensitivity. GHOSTX is distributed under the BSD 2-clause license and is available for download at http://www.bi.cs.titech.ac.jp/ghostx/. Currently, sequencing technology continues to improve, and sequencers are increasingly producing larger and larger quantities of data. This explosion of sequence data makes computational analysis with contemporary tools more difficult. We offer this tool as a potential solution to this problem.Keywords
This publication has 14 references indexed in Scilit:
- Structure, function and diversity of the healthy human microbiomeNature, 2012
- KEGG for integration and interpretation of large-scale molecular data setsNucleic Acids Research, 2011
- RAPSearch2: a fast and memory-efficient protein similarity search tool for next-generation sequencing dataBioinformatics, 2011
- GPU-BLAST: using graphics processors to accelerate protein sequence alignmentBioinformatics, 2010
- Comparative Metagenomics Revealed Commonly Enriched Gene Sets in Human Gut MicrobiomesDNA Research, 2007
- An obesity-associated gut microbiome with increased capacity for energy harvestNature, 2006
- Protein database searches using compositionally adjusted substitution matricesThe FEBS Journal, 2005
- BLAT—The BLAST-Like Alignment ToolGenome Research, 2002
- KEGG: Kyoto Encyclopedia of Genes and GenomesNucleic Acids Research, 2000
- Gapped BLAST and PSI-BLAST: a new generation of protein database search programsNucleic Acids Research, 1997