Cambricon-X: An accelerator for sparse neural networks

1 October 2016

conference paper
conference paper
Published by Institute of Electrical and Electronics Engineers (IEEE)

p. 1-12
https://doi.org/10.1109/micro.2016.7783723

Abstract

Neural networks (NNs) have been demonstrated to be useful in a broad range of applications such as image recognition, automatic translation and advertisement recommendation. State-of-the-art NNs are known to be both computationally and memory intensive, due to the ever-increasing deep structure, i.e., multiple layers with massive neurons and connections (i.e., synapses). Sparse neural networks have emerged as an effective solution to reduce the amount of computation and memory required. Though existing NN accelerators are able to efficiently process dense and regular networks, they cannot benefit from the reduction of synaptic weights. In this paper, we propose a novel accelerator, Cambricon-X, to exploit the sparsity and irregularity of NN models for increased efficiency. The proposed accelerator features a PE-based architecture consisting of multiple Processing Elements (PE). An Indexing Module (IM) efficiently selects and transfers needed neurons to connected PEs with reduced bandwidth requirement, while each PE stores irregular and compressed synapses for local computation in an asynchronous fashion. With 16 PEs, our accelerator is able to achieve at most 544 GOP/s in a small form factor (6.38 mm² and 954 mW at 65 nm). Experimental results over a number of representative sparse networks show that our accelerator achieves, on average, 7.23x speedup and 6.43x energy saving against the state-of-the-art NN accelerator.

Keywords

This publication has 26 references indexed in Scilit:

ShiDianNao
Published by Association for Computing Machinery (ACM) ,2015
Caffe
Published by Association for Computing Machinery (ACM) ,2014
DianNao
ACM SIGPLAN Notices, 2014
Communication Optimization of Iterative Sparse Matrix-Vector Multiply on GPUs and FPGAs
IEEE Transactions on Parallel and Distributed Systems, 2014
A Massively Parallel Coprocessor for Convolutional Neural Networks
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2009
Optimizing NUCA Organizations and Wiring Alternatives for Large Caches with CACTI 6.0
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2007
The Impact of Arithmetic Representation on Implementing MLP-BP on FPGAs: A Study
IEEE Transactions on Neural Networks, 2007
FPGA Implementation of a Pipelined On-Line Backpropagation
Journal of Signal Processing Systems, 2005
Recognition, Mining and Synthesis
Intel Technology Journal, 2005
Fat-trees: Universal networks for hardware-efficient supercomputing
IEEE Transactions on Computers, 1985

Cited by 431 articles