merAligner: A Fully Parallel Sequence Aligner
- 1 May 2015
- conference paper
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
- p. 561-570
- https://doi.org/10.1109/ipdps.2015.96
Abstract
Aligning a set of query sequences to a set of target sequences is an important task in bioinformatics. In this work we present merAligner, a highly parallel sequence aligner that implements a seed -- and -- extend algorithm and employs parallelism in all of its components. MerAligner relies on a high performance distributed hash table (seed index) and uses one-sided communication capabilities of the Unified Parallel C to facilitate a fine-grained parallelism. We leverage communication optimizations at the construction of the distributed hash table and software caching schemes to reduce communication during the aligning phase. Additionally, merAligner preprocesses the target sequences to extract properties enabling exact sequence matching with minimal communication. Finally, we efficiently parallelize the I/O intensive phases and implement an effective load balancing scheme. Results show that merAligner exhibits efficient scaling up to thousands of cores on a Cray XC30 supercomputer using real human and wheat genome data while significantly outperforming existing parallel alignment tools.Keywords
This publication has 29 references indexed in Scilit:
- Orion: Scaling Genomic Sequence Matching with Fine-Grained ParallelizationPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2014
- UPC++ for bioinformatics: A case study using genome-wide association studiesPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2014
- SOAP3-dp: Fast, Accurate and Sensitive GPU-Based Short Read AlignerPLOS ONE, 2013
- High-Throughput Compression of FASTQ Data with SeqDBIEEE/ACM Transactions on Computational Biology and Bioinformatics, 2012
- Rapid parallel genome indexing with MapReducePublished by Association for Computing Machinery (ACM) ,2011
- pFANGS: Parallel high speed sequence mapping for Next Generation 454-roche Sequencing readsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2010
- Parallel short sequence mapping for high throughput genome sequencingPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2009
- A performance analysis of the Berkeley UPC compilerPublished by Association for Computing Machinery (ACM) ,2003
- “Balls into Bins” — A Simple and Tight AnalysisLecture Notes in Computer Science, 1998
- Identification of common molecular subsequencesJournal of Molecular Biology, 1981