An Accelerator for Sparse Convolutional Neural Networks Leveraging Systolic General Matrix-matrix Multiplication

Open Access

25 May 2022

journal article
research article
Published by Association for Computing Machinery (ACM) in ACM Transactions on Architecture and Code Optimization

Vol. 19 (3), 1-26
https://doi.org/10.1145/3532863

Abstract

This paper proposes a novel hardware accelerator for the inference task with sparse convolutional neural networks (CNNs) by building a hardware unit to perform Image to Column (Im2Col) transformation of the input feature map coupled with a systolic array-based general matrix-matrix multiplication (GEMM) unit. Our design carefully overlaps the Im2Col transformation with the GEMM computation to maximize parallelism. We propose a novel design for the Im2Col unit that uses a set of distributed local memories connected by a ring network, which improves energy efficiency and latency by streaming the input feature map only once. The systolic array-based GEMM unit in the accelerator can be dynamically configured as multiple GEMM units with square-shaped systolic arrays or as a single GEMM unit with a tall systolic array. This dynamic reconfigurability enables effective pipelining of Im2Col and GEMM operations and attains high processing element utilization for a wide range of CNNs. Further, our accelerator is sparsity-aware, improving performance and energy efficiency by effectively mapping the sparse feature maps and weights to the processing elements, skipping ineffectual operations and unnecessary data movements involving zeros. Our prototype, SPOTS, is on average 2.16 ×, 1.74 ×, and 1.63 × faster than Gemmini, Eyeriss, and Sparse-PE, which are prior hardware accelerators for dense and sparse CNNs, respectively. SPOTS is also 78 ×, and 12 × more energy-efficient when compared to CPU and GPU implementations, respectively.

Keywords

This publication has 41 references indexed in Scilit:

Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks
IEEE Journal of Solid-State Circuits, 2016
EIE: Efficient Inference Engine on Compressed Deep Neural Network
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2016
Cnvlutin
ACM SIGARCH Computer Architecture News, 2016
ShiDianNao
Published by Association for Computing Machinery (ACM) ,2015
Going deeper with convolutions
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2015
A Survey of In-Band Full-Duplex Transmission: From the Perspective of PHY and MAC Layers
IEEE Communications Surveys & Tutorials, 2015
Impact of TSV and Device Scaling on the Quality of 3D ICs
Published by Springer Science and Business Media LLC ,2015
DaDianNao: A Machine-Learning Supercomputer
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2014
FreePDK: An Open-Source Variation-Aware Design Kit
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2007
An updated set of basic linear algebra subprograms (BLAS)
ACM Transactions on Mathematical Software, 2002

Cited by 8 articles