Gamma: leveraging Gustavson’s algorithm to accelerate sparse matrix multiplication

17 April 2021

conference paper
conference paper
Published by Association for Computing Machinery (ACM)

https://doi.org/10.1145/3445814.3446702

Abstract

Sparse matrix-sparse matrix multiplication (spMspM) is at the heart of a wide range of scientific and machine learning applications. spMspM is inefficient on general-purpose architectures, making accelerators attractive. However, prior spMspM accelerators use inner- or outer-product dataflows that suffer poor input or output reuse, leading to high traffic and poor performance. These prior accelerators have not explored Gustavson's algorithm, an alternative spMspM dataflow that does not suffer from these problems but features irregular memory access patterns that prior accelerators do not support. We present GAMMA, an spMspM accelerator that uses Gustavson's algorithm to address the challenges of prior work. GAMMA performs spMspM's computation using specialized processing elements with simple high-radix mergers, and performs many merges in parallel to achieve high throughput. GAMMA uses a novel on-chip storage structure that combines features of both caches and explicitly managed buffers. This structure captures Gustavson's irregular reuse patterns and streams thousands of concurrent sparse fibers (i.e., lists of coordinates and values for rows or columns) with explicitly decoupled data movement. GAMMA features a new dynamic scheduling algorithm to achieve high utilization despite irregularity. We also present new preprocessing algorithms that boost GAMMA's efficiency and versatility. As a result, GAMMA outperforms prior accelerators by gmean 2.1x, and reduces memory traffic by gmean 2.2x and by up to 13x.

Keywords

Funding Information

DARPA SDH (HR0011-18-3-0007)
Semiconductor Research Corporation (2020-AH-2985)

This publication has 32 references indexed in Scilit:

Compressing Graphs and Indexes with Recursive Graph Bisection
Published by Association for Computing Machinery (ACM) ,2016
Speedup Graph Processing by Graph Ordering
Published by Association for Computing Machinery (ACM) ,2016
Stash
Published by Association for Computing Machinery (ACM) ,2015
Parallel Triangle Counting and Enumeration Using Matrix Algebra
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2015
Intel Math Kernel Library
Published by Springer Science and Business Media LLC ,2014
DianNao
ACM SIGPLAN Notices, 2014
Colored intersection searching via sparse rectangular matrix multiplication
Published by Association for Computing Machinery (ACM) ,2006
OSKI: A library of automatically tuned sparse matrix kernels
Journal of Physics: Conference Series, 2005
Improving performance of sparse matrix-vector multiplication
Published by Association for Computing Machinery (ACM) ,1999
Reducing the bandwidth of sparse symmetric matrices
Published by Association for Computing Machinery (ACM) ,1969

Cited by 52 articles