From high-level deep neural models to FPGAs
- 1 October 2016
- conference paper
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
Abstract
Deep Neural Networks (DNNs) are compute-intensive learning models with growing applicability in a wide range of domains. FPGAs are an attractive choice for DNNs since they offer a programmable substrate for acceleration and are becoming available across different market segments. However, obtaining both performance and energy efficiency with FPGAs is a laborious task even for expert hardware designers. Furthermore, the large memory footprint of DNNs, coupled with the FPGAs' limited on-chip storage makes DNN acceleration using FPGAs more challenging. This work tackles these challenges by devising DnnWeaver, a framework that automatically generates a synthesizable accelerator for a given (DNN, FPGA) pair from a high-level specification in Caffe [1]. To achieve large benefits while preserving automation, DNNWEAVER generates accelerators using hand-optimized design templates. First, DnnWeaver translates a given high-level DNN specification to its novel ISA that represents a macro dataflow graph of the DNN. The DnnWeaver compiler is equipped with our optimization algorithm that tiles, schedules, and batches DNN operations to maximize data reuse and best utilize target FPGA's memory and other resources. The final result is a custom synthesizable accelerator that best matches the needs of the DNN while providing high performance and efficiency gains for the target FPGA. We use DnnWeaver to generate accelerators for a set of eight different DNN models and three different FPGAs, Xilinx Zynq, Altera Stratix V, and Altera Arria 10. We use hardware measurements to compare the generated accelerators to both multicore CPUs (ARM Cortex A15 and Xeon E3) and many-core GPUs (Tegra K1, GTX 650Ti, and Tesla K40). In comparison, the generated accelerators deliver superior performance and efficiency without requiring the programmers to participate in the arduous task of hardware design.Keywords
This publication has 25 references indexed in Scilit:
- DjiNN and TonicPublished by Association for Computing Machinery (ACM) ,2015
- ShiDianNaoPublished by Association for Computing Machinery (ACM) ,2015
- ImageNet Large Scale Visual Recognition ChallengeInternational Journal of Computer Vision, 2015
- CaffePublished by Association for Computing Machinery (ACM) ,2014
- DianNaoACM SIGPLAN Notices, 2014
- Convolution engineACM SIGARCH Computer Architecture News, 2013
- Bundled execution of recurring traces for energy-efficient general purpose processingPublished by Association for Computing Machinery (ACM) ,2011
- A dynamically configurable coprocessor for convolutional neural networksACM SIGARCH Computer Architecture News, 2010
- A Fast Learning Algorithm for Deep Belief NetsNeural Computation, 2006
- Gradient-based learning applied to document recognitionProceedings of the IEEE, 1998