SIGMA: A Sparse and Irregular GEMM Accelerator with Flexible Interconnects for DNN Training

Top Cited Papers

1 February 2020

conference paper
conference paper
Published by Institute of Electrical and Electronics Engineers (IEEE)

p. 58-70
https://doi.org/10.1109/hpca47549.2020.00015

Abstract

The advent of Deep Learning (DL) has radically transformed the computing industry across the entire spectrum from algorithms to circuits. As myriad application domains embrace DL, it has become synonymous with a genre of workloads across vision, speech, language, recommendations, robotics, and games. The key compute kernel within most DL workloads is general matrix-matrix multiplications (GEMMs), which appears frequently during both the forward pass (inference and training) and backward pass (training). GEMMs are a natural choice for hardware acceleration to speed up training, and have led to 2D systolic architectures like NVIDIA tensor cores and Google Tensor Processing Unit (TPU). Unfortunately, emerging GEMMs in DL are highly irregular and sparse, which lead to poor data mappings on systolic architectures. This paper proposes SIGMA, a flexible and scalable architecture that offers high utilization of all its processing elements (PEs) regardless of kernel shape and sparsity. Within SIGMA includes a novel reduction tree microarchitecture named Forwarding Adder Network (FAN). SIGMA performs 5.7x better than systolic array architectures for irregular sparse matrices, and roughly 3x better than state-of-the-art sparse accelerators. We demonstrate an instance of SIGMA operating at 10.8 TFLOPS efficiency across arbitrary levels of sparsity, with a 65.10 mm^2 and 22.33 W footprint on a 28 nm process.

Keywords

This publication has 30 references indexed in Scilit:

In-Datacenter Performance Analysis of a Tensor Processing Unit
Published by Association for Computing Machinery (ACM) ,2017
Neural Collaborative Filtering
Published by Association for Computing Machinery (ACM) ,2017
PipeLayer: A Pipelined ReRAM-Based Accelerator for Deep Learning
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2017
Cambricon-X: An accelerator for sparse neural networks
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2016
EIE
ACM SIGARCH Computer Architecture News, 2016
14.5 Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2016
Caffe con Troll
Published by Association for Computing Machinery (ACM) ,2015
DianNao
ACM SIGPLAN Notices, 2014
Parallel routing algorithms in Benes-Clos networks
IEEE Transactions on Communications, 2002
On-line algorithms for path selection in a nonblocking network
Published by Association for Computing Machinery (ACM) ,1990

Cited by 217 articles