Maximizing CNN Accelerator Efficiency Through Resource Partitioning

24 June 2017

journal article
conference paper
Published by Association for Computing Machinery (ACM) in ACM SIGARCH Computer Architecture News

Vol. 45 (2), 535-547
https://doi.org/10.1145/3140659.3080221

Abstract

Convolutional neural networks (CNNs) are revolutionizing machine learning, but they present significant computational challenges. Recently, many FPGA-based accelerators have been proposed to improve the performance and efficiency of CNNs. Current approaches construct a single processor that computes the CNN layers one at a time; the processor is optimized to maximize the throughput at which the collection of layers is computed. However, this approach leads to inefficient designs because the same processor structure is used to compute CNN layers of radically varying dimensions. We present a new CNN accelerator paradigm and an accompanying automated design methodology that partitions the available FPGA resources into multiple processors, each of which is tailored for a different subset of the CNN convolutional layers. Using the same FPGA resources as a single large processor, multiple smaller specialized processors increase computational efficiency and lead to a higher overall throughput. Our design methodology achieves 3.8x higher throughput than the state-of-the-art approach on evaluating the popular AlexNet CNN on a Xilinx Virtex-7 FPGA. For the more recent SqueezeNet and GoogLeNet, the speedups are 2.2x and 2.0x.

Keywords

This publication has 22 references indexed in Scilit:

DeepBurning
Published by Association for Computing Machinery (ACM) ,2016
C-brain
Published by Association for Computing Machinery (ACM) ,2016
Proteus
Published by Association for Computing Machinery (ACM) ,2016
Going Deeper with Embedded FPGA Platform for Convolutional Neural Network
Published by Association for Computing Machinery (ACM) ,2016
Throughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks
Published by Association for Computing Machinery (ACM) ,2016
Going deeper with convolutions
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2015
Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks
Published by Association for Computing Machinery (ACM) ,2015
DaDianNao: A Machine-Learning Supercomputer
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2014
DianNao
ACM SIGPLAN Notices, 2014
NeuFlow: A runtime reconfigurable dataflow processor for vision
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2011

Cited by 67 articles