A GPU Implementation of Fast Parallel Markov Clustering in Bioinformatics Using EllPACK-R Sparse Data Format

Abstract
The massively parallel computing using graphical processing unit (GPU), which based on tens of thousands of parallel threats within hundreds of GPU's streaming processors, has gained broad popularity and attracted researchers in a wide range of application areas from finance, computer aided engineering, computational fluid dynamics, game physics, numerics, science, medical imaging, life science, and so on, including molecular biology and bioinformatics. Meanwhile, Markov clustering algorithm (MCL) has become one of the most effective and highly cited methods to detect and analyze the communities/clusters within an interaction network dataset on many real world problems such us social, technological, or biological networks including protein-protein interaction networks. However, as the dataset become bigger and bigger, the computation time of MCL algorithm become slower and slower. Hence, GPU computing is an interesting and challenging alternative to attempt to improve the MCL performance. In this poster paper we introduce our improvement of MCL performance based on ELLPACK-R sparse dataset format using GPU computing with the Compute Unified Device Architecture tool (CUDA) from NVIDIA (called CUDA-MCL). As the results show the significant improvement in CUDA-MCL performance and with the low-cost and widely available GPU devices in the market today, this CUDA-MCL implementation is allowing large-scale parallel computation on off-the-shelf desktop machines. Moreover the GPU computing approaches potentially may contribute to significantly change the way bioinformaticians and biologists compute and interact with their data.