GPU-FS-kNN: A Software Tool for Fast and Scalable kNN Computation Using GPUs
Open Access
- 28 August 2012
- journal article
- research article
- Published by Public Library of Science (PLoS) in PLOS ONE
- Vol. 7 (8), e44000
- https://doi.org/10.1371/journal.pone.0044000
Abstract
The analysis of biological networks has become a major challenge due to the recent development of high-throughput techniques that are rapidly producing very large data sets. The exploding volumes of biological data are craving for extreme computational power and special computing facilities (i.e. super-computers). An inexpensive solution, such as General Purpose computation based on Graphics Processing Units (GPGPU), can be adapted to tackle this challenge, but the limitation of the device internal memory can pose a new problem of scalability. An efficient data and computational parallelism with partitioning is required to provide a fast and scalable solution to this problem. We propose an efficient parallel formulation of the k-Nearest Neighbour (kNN) search problem, which is a popular method for classifying objects in several fields of research, such as pattern recognition, machine learning and bioinformatics. Being very simple and straightforward, the performance of the kNN search degrades dramatically for large data sets, since the task is computationally intensive. The proposed approach is not only fast but also scalable to large-scale instances. Based on our approach, we implemented a software tool GPU-FS-kNN (GPU-based Fast and Scalable k-Nearest Neighbour) for CUDA enabled GPUs. The basic approach is simple and adaptable to other available GPU architectures. We observed speed-ups of 50–60 times compared with CPU implementation on a well-known breast microarray study and its associated data sets. Our GPU-based Fast and Scalable k-Nearest Neighbour search technique (GPU-FS-kNN) provides a significant performance improvement for nearest neighbour computation in large-scale networks. Source code and the software tool is available under GNU Public License (GPL) at https://sourceforge.net/p/gpufsknn/.This publication has 29 references indexed in Scilit:
- An Iron Regulatory Gene Signature Predicts Outcome in Breast CancerCancer Research, 2011
- Differences in Abundances of Cell-Signalling Proteins in Blood Reveal Novel Biomarkers for Early Detection Of Clinical Alzheimer's DiseasePLOS ONE, 2011
- Ferroportin and hepcidin: a new hope in diagnosis, prognosis, and therapy for breast cancerBreast Cancer Research, 2010
- Towards a unifying, systems biology understanding of large-scale cellular death and destruction caused by poorly liganded iron: Parkinson’s, Huntington’s, Alzheimer’s, prions, bactericides, chemical toxicology and others as examplesArchives of Toxicology, 2010
- A survey of visualization tools for biological network analysisBioData Mining, 2008
- Nearest Neighbor Networks: clustering expression data based on gene neighborhoodsBMC Bioinformatics, 2007
- Distributed computation of the knn graph for large high-dimensional point setsJournal of Parallel and Distributed Computing, 2007
- Antiestrogen resistance in breast cancer and the role of estrogen receptor signalingOncogene, 2003
- A Gene-Expression Signature as a Predictor of Survival in Breast CancerThe New England Journal of Medicine, 2002
- Gene expression profiling predicts clinical outcome of breast cancerNature, 2002