C-brain
- 5 June 2016
- conference paper
- conference paper
- Published by Association for Computing Machinery (ACM) in Proceedings of the 53rd Annual Design Automation Conference
- p. 123-123:6
- https://doi.org/10.1145/2897937.2897995
Abstract
Convolutional neural networks (CNN) accelerators have been proposed as an efficient hardware solution for deep learning based applications, which are known to be both compute-and-memory intensive. Although the most advanced CNN accelerators can deliver high computational throughput, the performance is highly unstable. Once changed to accommodate a new network with different parameters like layers and kernel size, the fixed hardware structure, may no longer well match the data flows. Consequently, the accelerator will fail to deliver high performance due to the underutilization of either logic resource or memory bandwidth. To overcome this problem, we proposed a novel deep learning accelerator, which offers multiple types of data-level parallelism: inter-kernel, intra-kernel and hybrid. Our design can adaptively switch among the three types of parallelism and the corresponding data tiling schemes to dynamically match different networks or even different layers of a single network. No matter how we change the hardware configurations or network types, the proposed network mapping strategy ensures the optimal performance and energy-efficiency. Compared with previous state-of-the-art NN accelerators, it is possible to achieve a speedup of 4.0x-8.3x for some layers of the well-known large scale CNNs. For the whole phase of network forward-propagation, our design achieves 28.04% PE energy saving, 90.3% on-chip memory energy saving on average.Keywords
This publication has 15 references indexed in Scilit:
- ShiDianNaoPublished by Association for Computing Machinery (ACM) ,2015
- Going deeper with convolutionsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2015
- Deep learningNature, 2015
- ImageNet Large Scale Visual Recognition ChallengeInternational Journal of Computer Vision, 2015
- Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural NetworksPublished by Association for Computing Machinery (ACM) ,2015
- 4.6 A1.93TOPS/W scalable deep learning/inference processor with tetra-parallel MIMD architecture for big-data applicationsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2015
- CaffePublished by Association for Computing Machinery (ACM) ,2014
- DianNaoACM SIGPLAN Notices, 2014
- NeuFlow: A runtime reconfigurable dataflow processor for visionPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2011
- A dynamically configurable coprocessor for convolutional neural networksACM SIGARCH Computer Architecture News, 2010