From high-level deep neural models to FPGAs

1 October 2016

conference paper
conference paper
Published by Institute of Electrical and Electronics Engineers (IEEE)

p. 1-12
https://doi.org/10.1109/micro.2016.7783720

Abstract

Deep Neural Networks (DNNs) are compute-intensive learning models with growing applicability in a wide range of domains. FPGAs are an attractive choice for DNNs since they offer a programmable substrate for acceleration and are becoming available across different market segments. However, obtaining both performance and energy efficiency with FPGAs is a laborious task even for expert hardware designers. Furthermore, the large memory footprint of DNNs, coupled with the FPGAs' limited on-chip storage makes DNN acceleration using FPGAs more challenging. This work tackles these challenges by devising DnnWeaver, a framework that automatically generates a synthesizable accelerator for a given (DNN, FPGA) pair from a high-level specification in Caffe [1]. To achieve large benefits while preserving automation, DNNWEAVER generates accelerators using hand-optimized design templates. First, DnnWeaver translates a given high-level DNN specification to its novel ISA that represents a macro dataflow graph of the DNN. The DnnWeaver compiler is equipped with our optimization algorithm that tiles, schedules, and batches DNN operations to maximize data reuse and best utilize target FPGA's memory and other resources. The final result is a custom synthesizable accelerator that best matches the needs of the DNN while providing high performance and efficiency gains for the target FPGA. We use DnnWeaver to generate accelerators for a set of eight different DNN models and three different FPGAs, Xilinx Zynq, Altera Stratix V, and Altera Arria 10. We use hardware measurements to compare the generated accelerators to both multicore CPUs (ARM Cortex A15 and Xeon E3) and many-core GPUs (Tegra K1, GTX 650Ti, and Tesla K40). In comparison, the generated accelerators deliver superior performance and efficiency without requiring the programmers to participate in the arduous task of hardware design.

Keywords

This publication has 25 references indexed in Scilit:

DjiNN and Tonic
Published by Association for Computing Machinery (ACM) ,2015
ShiDianNao
Published by Association for Computing Machinery (ACM) ,2015
ImageNet Large Scale Visual Recognition Challenge
International Journal of Computer Vision, 2015
Caffe
Published by Association for Computing Machinery (ACM) ,2014
DianNao
ACM SIGPLAN Notices, 2014
Convolution engine
ACM SIGARCH Computer Architecture News, 2013
Bundled execution of recurring traces for energy-efficient general purpose processing
Published by Association for Computing Machinery (ACM) ,2011
A dynamically configurable coprocessor for convolutional neural networks
ACM SIGARCH Computer Architecture News, 2010
A Fast Learning Algorithm for Deep Belief Nets
Neural Computation, 2006
Gradient-based learning applied to document recognition
Proceedings of the IEEE, 1998

Cited by 254 articles