Scalable GPU graph traversal
- 25 February 2012
- conference paper
- conference paper
- Published by Association for Computing Machinery (ACM)
- Vol. 47 (8), 117-128
- https://doi.org/10.1145/2145816.2145832
Abstract
Breadth-first search (BFS) is a core primitive for graph traversal and a basis for many higher-level graph analysis algorithms. It is also representative of a class of parallel computations whose memory accesses and work distribution are both irregular and data-dependent. Recent work has demonstrated the plausibility of GPU sparse graph traversal, but has tended to focus on asymptotically inefficient algorithms that perform poorly on graphs with non-trivial diameter. We present a BFS parallelization focused on fine-grained task management constructed from efficient prefix sum that achieves an asymptotically optimal O(|V|+|E|) work complexity. Our implementation delivers excellent performance on diverse graphs, achieving traversal rates in excess of 3.3 billion and 8.3 billion traversed edges per second using single and quad-GPU configurations, respectively. This level of performance is several times faster than state-of-the-art implementations both CPU and GPU platforms.Keywords
This publication has 17 references indexed in Scilit:
- HIGH PERFORMANCE AND SCALABLE RADIX SORTING: A CASE STUDY OF IMPLEMENTING DYNAMIC PARALLELISM FOR GPU COMPUTINGParallel Processing Letters, 2011
- Accelerating CUDA graph algorithms at maximum warpPublished by Association for Computing Machinery (ACM) ,2011
- An effective GPU implementation of breadth-first searchPublished by Association for Computing Machinery (ACM) ,2010
- Implementing sparse matrix-vector multiplication on throughput-oriented processorsPublished by Association for Computing Machinery (ACM) ,2009
- Fast scan algorithms on graphics processorsPublished by Association for Computing Machinery (ACM) ,2008
- A Scalable Distributed Parallel Breadth-First Search Algorithm on BlueGene/LPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2005
- On the Architectural Requirements for Efficient Execution of Graph AlgorithmsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2005
- High-probability parallel transitive closure algorithmsPublished by Association for Computing Machinery (ACM) ,1990
- Scans as primitive parallel operationsIEEE Transactions on Computers, 1989
- Data parallel algorithmsCommunications of the ACM, 1986