A fast CUDA implementation of agrep algorithm for approximate nucleotide sequence matching
- 1 June 2011
- conference paper
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE) in 2011 IEEE 9th Symposium on Application Specific Processors (SASP)
Abstract
The availability of huge amounts of nucleotide sequences catalyzes the development of fast algorithms for approximate DNA and RNA string matching. However, most existing online algorithms can only handle small scale problems. When querying large genomes, their performance becomes unacceptable. Offline algorithms such as Bowtie and BWA require building indexes, and their memory requirement is high. We have developed a fast CUDA implementation of agrep algorithm for approximate nucleotide sequence matching by exploiting the huge computational power of modern GPU hardware. Our CUDA program is capable of searching large genomes for patterns of length up to 64 with edit distance up to 9. For example, it is able to search the entire human genome (3.10 Gbp in 24 chromosomes) for patterns of lengths of 30 and 60 with edit distances of 3 and 6 within 371 and 1,188 milliseconds respectively on one NVIDIA GeForce GTX285 graphics card, achieving 70-fold and 36-fold speedups over multithreaded QuadCore CPU counterpart. Our program employs online approach and does not require building indexes of any kind, it thus can be applied in real time. Using two-bits-for-one-character binary representation, its memory requirement is merely one fourth of the original genome size. Therefore it is possible to load multiple genomes simultaneously. The x86 and x64 executables for Linux and Windows, C++ source code, documentations, user manual, and an AJAX MVC website for online real time searching are available at http://agrep.cse.cuhk.edu.hk. Users can also send emails to CUDAagrepGmail.com to queue up for a job.Keywords
This publication has 8 references indexed in Scilit:
- GPU-RMAP: Accelerating Short-Read Mapping on Graphics ProcessorsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2010
- A fast algorithm for exact sequence search in biological sequences using polyphase decompositionBioinformatics, 2010
- Serial and parallel methods for i/o efficient suffix tree constructionPublished by Association for Computing Machinery (ACM) ,2009
- Fast and accurate short read alignment with Burrows–Wheeler transformBioinformatics, 2009
- Ultrafast and memory-efficient alignment of short DNA sequences to the human genomeGenome Biology, 2009
- On-Line Approximate String Matching with Bounded ErrorsPublished by Springer Science and Business Media LLC ,2008
- Fast Matching Method for DNA SequencesLecture Notes in Computer Science, 2007
- Fast text searchingCommunications of the ACM, 1992