Weight Sparseness for a Feature-Map-Split-CNN Toward Low-Cost Embedded FPGAs
- 1 December 2021
- journal article
- research article
- Published by Institute of Electronics, Information and Communications Engineers (IEICE) in IEICE Transactions on Information and Systems
- Vol. E104.D (12), 2040-2047
- https://doi.org/10.1587/transinf.2021pap0011
Abstract
Convolutional neural network (CNN) has a high recognition rate in image recognition and are used in embedded systems such as smartphones, robots and self-driving cars. Low-end FPGAs are candidates for embedded image recognition platforms because they achieve real-time performance at a low cost. However, CNN has significant parameters called weights and internal data called feature maps, which pose a challenge for FPGAs for performance and memory capacity. To solve these problems, we exploit a split-CNN and weight sparseness. The split-CNN reduces the memory footprint by splitting the feature map into smaller patches and allows the feature map to be stored in the FPGA's high-throughput on-chip memory. Weight sparseness reduces computational costs and achieves even higher performance. We designed a dedicated architecture of a sparse CNN and a memory buffering scheduling for a split-CNN and implemented this on the PYNQ-Z1 FPGA board with a low-end FPGA. An experiment on classification using VGG16 shows that our implementation is 3.1 times faster than the GPU, and 5.4 times faster than an existing FPGA implementation.Keywords
This publication has 16 references indexed in Scilit:
- A High Throughput Acceleration for Hybrid Neural Networks With Efficient Resource Management on FPGAIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2018
- Fast and efficient implementation of Convolutional Neural Networks on FPGAPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2017
- YOLO9000: Better, Faster, StrongerPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2017
- Realtime Multi-person 2D Pose Estimation Using Part Affinity FieldsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2017
- Double MAC: Doubling the performance of convolutional neural networks on modern FPGAsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2017
- SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image SegmentationIEEE Transactions on Pattern Analysis and Machine Intelligence, 2017
- Cnvlutin: Ineffectual-Neuron-Free Deep Neural Network ComputingPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2016
- CnvlutinACM SIGARCH Computer Architecture News, 2016
- Deep learningNature, 2015
- Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural NetworksPublished by Association for Computing Machinery (ACM) ,2015