Weight Sparseness for a Feature-Map-Split-CNN Toward Low-Cost Embedded FPGAs

1 December 2021

journal article
research article
Published by Institute of Electronics, Information and Communications Engineers (IEICE) in IEICE Transactions on Information and Systems

Vol. E104.D (12), 2040-2047
https://doi.org/10.1587/transinf.2021pap0011

Abstract

Convolutional neural network (CNN) has a high recognition rate in image recognition and are used in embedded systems such as smartphones, robots and self-driving cars. Low-end FPGAs are candidates for embedded image recognition platforms because they achieve real-time performance at a low cost. However, CNN has significant parameters called weights and internal data called feature maps, which pose a challenge for FPGAs for performance and memory capacity. To solve these problems, we exploit a split-CNN and weight sparseness. The split-CNN reduces the memory footprint by splitting the feature map into smaller patches and allows the feature map to be stored in the FPGA's high-throughput on-chip memory. Weight sparseness reduces computational costs and achieves even higher performance. We designed a dedicated architecture of a sparse CNN and a memory buffering scheduling for a split-CNN and implemented this on the PYNQ-Z1 FPGA board with a low-end FPGA. An experiment on classification using VGG16 shows that our implementation is 3.1 times faster than the GPU, and 5.4 times faster than an existing FPGA implementation.

Keywords

This publication has 16 references indexed in Scilit:

A High Throughput Acceleration for Hybrid Neural Networks With Efficient Resource Management on FPGA
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2018
Fast and efficient implementation of Convolutional Neural Networks on FPGA
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2017
YOLO9000: Better, Faster, Stronger
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2017
Realtime Multi-person 2D Pose Estimation Using Part Affinity Fields
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2017
Double MAC: Doubling the performance of convolutional neural networks on modern FPGAs
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2017
SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation
IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017
Cnvlutin: Ineffectual-Neuron-Free Deep Neural Network Computing
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2016
Cnvlutin
ACM SIGARCH Computer Architecture News, 2016
Deep learning
Nature, 2015
Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks
Published by Association for Computing Machinery (ACM) ,2015

Cited by 1 article