DianNao

24 February 2014

conference paper
conference paper
Published by Association for Computing Machinery (ACM) in ACM SIGPLAN Notices

Vol. 49 (4), 269-284
https://doi.org/10.1145/2541940.2541967

Abstract

Machine-Learning tasks are becoming pervasive in a broad range of domains, and in a broad range of systems (from embedded systems to data centers). At the same time, a small set of machine-learning algorithms (especially Convolutional and Deep Neural Networks, i.e., CNNs and DNNs) are proving to be state-of-the-art across many applications. As architectures evolve towards heterogeneous multi-cores composed of a mix of cores and accelerators, a machine-learning accelerator can achieve the rare combination of efficiency (due to the small number of target algorithms) and broad application scope. Until now, most machine-learning accelerator designs have focused on efficiently implementing the computational part of the algorithms. However, recent state-of-the-art CNNs and DNNs are characterized by their large size. In this study, we design an accelerator for large-scale CNNs and DNNs, with a special emphasis on the impact of memory on accelerator design, performance and energy. We show that it is possible to design an accelerator with a high throughput, capable of performing 452 GOP/s (key NN operations such as synaptic weight multiplications and neurons outputs additions) in a small footprint of 3.02 mm(2) and 485 mW; compared to a 128-bit 2GHz SIMD processor, the accelerator is 117.87x faster, and it can reduce the total energy by 21.08x. The accelerator characteristics are obtained after layout at 65nm. Such a high throughput in a small footprint can open up the usage of state-of-the-art machine-learning algorithms in a broad set of systems and for a broad set of applications.

Keywords

This publication has 27 references indexed in Scilit:

Learning deep structured semantic models for web search using clickthrough data
Published by Association for Computing Machinery (ACM) ,2013
Neural Acceleration for General-Purpose Approximate Programs
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2012
BenchNN: On the broad potential application scope of hardware neural network accelerators
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2012
Accelerating neuromorphic vision algorithms for recognition
Published by Association for Computing Machinery (ACM) ,2012
A digital neurosynaptic core using embedded crossbar memory with 45pJ per spike in 45nm
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2011
NeuFlow: A runtime reconfigurable dataflow processor for vision
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2011
Dynamically Reconfigurable Silicon Array of Spiking Neurons With Conductance-Based Synapses
IEEE Transactions on Neural Networks, 2007
An Efficient Hardware Architecture for a Neural Network Activation Function Generator
Lecture Notes in Computer Science, 2006
Software assistance for data caches
Future Generation Computer Systems, 1995
Finite precision error analysis of neural network hardware implementations
International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 1993

Cited by 1006 articles