A fast CUDA implementation of agrep algorithm for approximate nucleotide sequence matching

Abstract

The availability of huge amounts of nucleotide sequences catalyzes the development of fast algorithms for approximate DNA and RNA string matching. However, most existing online algorithms can only handle small scale problems. When querying large genomes, their performance becomes unacceptable. Offline algorithms such as Bowtie and BWA require building indexes, and their memory requirement is high. We have developed a fast CUDA implementation of agrep algorithm for approximate nucleotide sequence matching by exploiting the huge computational power of modern GPU hardware. Our CUDA program is capable of searching large genomes for patterns of length up to 64 with edit distance up to 9. For example, it is able to search the entire human genome (3.10 Gbp in 24 chromosomes) for patterns of lengths of 30 and 60 with edit distances of 3 and 6 within 371 and 1,188 milliseconds respectively on one NVIDIA GeForce GTX285 graphics card, achieving 70-fold and 36-fold speedups over multithreaded QuadCore CPU counterpart. Our program employs online approach and does not require building indexes of any kind, it thus can be applied in real time. Using two-bits-for-one-character binary representation, its memory requirement is merely one fourth of the original genome size. Therefore it is possible to load multiple genomes simultaneously. The x86 and x64 executables for Linux and Windows, C++ source code, documentations, user manual, and an AJAX MVC website for online real time searching are available at http://agrep.cse.cuhk.edu.hk. Users can also send emails to CUDAagrepGmail.com to queue up for a job.

Keywords

This publication has 8 references indexed in Scilit:

GPU-RMAP: Accelerating Short-Read Mapping on Graphics Processors
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2010
A fast algorithm for exact sequence search in biological sequences using polyphase decomposition
Bioinformatics, 2010
Serial and parallel methods for i/o efficient suffix tree construction
Published by Association for Computing Machinery (ACM) ,2009
Fast and accurate short read alignment with Burrows–Wheeler transform
Bioinformatics, 2009
Ultrafast and memory-efficient alignment of short DNA sequences to the human genome
Genome Biology, 2009
On-Line Approximate String Matching with Bounded Errors
Published by Springer Science and Business Media LLC ,2008
Fast Matching Method for DNA Sequences
Lecture Notes in Computer Science, 2007
Fast text searching
Communications of the ACM, 1992

Cited by 10 articles