C-brain

5 June 2016

conference paper
conference paper
Published by Association for Computing Machinery (ACM) in Proceedings of the 53rd Annual Design Automation Conference

p. 123-123:6
https://doi.org/10.1145/2897937.2897995

Abstract

Convolutional neural networks (CNN) accelerators have been proposed as an efficient hardware solution for deep learning based applications, which are known to be both compute-and-memory intensive. Although the most advanced CNN accelerators can deliver high computational throughput, the performance is highly unstable. Once changed to accommodate a new network with different parameters like layers and kernel size, the fixed hardware structure, may no longer well match the data flows. Consequently, the accelerator will fail to deliver high performance due to the underutilization of either logic resource or memory bandwidth. To overcome this problem, we proposed a novel deep learning accelerator, which offers multiple types of data-level parallelism: inter-kernel, intra-kernel and hybrid. Our design can adaptively switch among the three types of parallelism and the corresponding data tiling schemes to dynamically match different networks or even different layers of a single network. No matter how we change the hardware configurations or network types, the proposed network mapping strategy ensures the optimal performance and energy-efficiency. Compared with previous state-of-the-art NN accelerators, it is possible to achieve a speedup of 4.0x-8.3x for some layers of the well-known large scale CNNs. For the whole phase of network forward-propagation, our design achieves 28.04% PE energy saving, 90.3% on-chip memory energy saving on average.

Keywords

This publication has 15 references indexed in Scilit:

ShiDianNao
Published by Association for Computing Machinery (ACM) ,2015
Going deeper with convolutions
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2015
Deep learning
Nature, 2015
ImageNet Large Scale Visual Recognition Challenge
International Journal of Computer Vision, 2015
Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks
Published by Association for Computing Machinery (ACM) ,2015
4.6 A1.93TOPS/W scalable deep learning/inference processor with tetra-parallel MIMD architecture for big-data applications
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2015
Caffe
Published by Association for Computing Machinery (ACM) ,2014
DianNao
ACM SIGPLAN Notices, 2014
NeuFlow: A runtime reconfigurable dataflow processor for vision
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2011
A dynamically configurable coprocessor for convolutional neural networks
ACM SIGARCH Computer Architecture News, 2010

Cited by 87 articles