A machine learning approach for accelerating DNA sequence analysis
- 26 June 2016
- journal article
- research article
- Published by SAGE Publications in The International Journal of High Performance Computing Applications
- Vol. 32 (3), 363-379
- https://doi.org/10.1177/1094342016654214
Abstract
The DNA sequence analysis is a data and computationally intensive problem and therefore demands suitable parallel computing resources and algorithms. In this paper, we describe an optimized approach for DNA sequence analysis on a heterogeneous platform that is accelerated with the Intel Xeon Phi. Such platforms commonly comprise one or two general purpose host central processing units (CPUs) and one or more Xeon Phi devices. We present a parallel algorithm that shares the work of DNA sequence analysis between the host CPUs and the Xeon Phi device to reduce the overall analysis time. For automatic worksharing we use a supervised machine learning approach, which predicts the performance of DNA sequence analysis on the host and device and accordingly maps fractions of the DNA sequence to the host and device. We evaluate our approach empirically using real-world DNA segments for human and various animals on a heterogeneous platform that comprises two 12-core Intel Xeon E5 CPUs and an Intel Xeon Phi 7120P device with 61 cores.Keywords
This publication has 20 references indexed in Scilit:
- Parallelizing and optimizing a hybrid differential evolution with Pareto tournaments for discovering motifs in DNA sequencesThe Journal of Supercomputing, 2014
- Designing a novel hybrid swarm based multiobjective evolutionary algorithm for finding DNA motifsPublished by Association for Computing Machinery (ACM) ,2013
- Load balancing in a changing worldPublished by Association for Computing Machinery (ACM) ,2013
- n-step FM-Index for Faster Pattern MatchingProcedia Computer Science, 2013
- Accelerating Pattern Matching Using a Novel Parallel Algorithm on GPUsInternational Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2012
- Programmability and performance portability aspects of heterogeneous multi-/manycore systemsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2012
- A fast CUDA implementation of agrep algorithm for approximate nucleotide sequence matchingPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2011
- A Static Task Partitioning Approach for Heterogeneous Systems Using OpenCLLecture Notes in Computer Science, 2011
- String Matching on a Multicore GPU Using CUDAPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2009
- A vision for the future of genomics researchNature, 2003