A Low-Latency Inference of Randomly Wired Convolutional Neural Networks on an FPGA
- 1 December 2021
- journal article
- research article
- Published by Institute of Electronics, Information and Communications Engineers (IEICE) in IEICE Transactions on Information and Systems
- Vol. E104.D (12), 2068-2077
- https://doi.org/10.1587/transinf.2021pap0010
Abstract
Convolutional neural networks (CNNs) are widely used for image processing tasks in both embedded systems and data centers. In data centers, high accuracy and low latency are desired for various tasks such as image processing of streaming videos. We propose an FPGA-based low-latency CNN inference for randomly wired convolutional neural networks (RWCNNs), whose layer structures are based on random graph models. Because RWCNNs have several convolution layers that have no direct dependencies between them, our architecture can process them efficiently using a pipeline method. At each layer, we need to use the calculation results of multiple layers as the input. We use an FPGA with HBM2 to enable parallel access to the input data with multiple HBM2 channels. We schedule the order of execution of the layers to improve the pipeline efficiency. We build a conflict graph using the scheduling results. Then, we allocate the calculation results of each layer to the HBM2 channels by coloring the graph. Because the pipeline execution needs to be properly controlled, we developed an automatic generation tool for hardware functions. We implemented the proposed architecture on the Alveo U50 FPGA. We investigated a trade-off between latency and recognition accuracy for the ImageNet classification task by comparing the inference performances for different input image sizes. We compared our accelerator with a conventional accelerator for ResNet-50. The results show that our accelerator reduces the latency by 2.21 times. We also obtained 12.6 and 4.93 times better efficiency than CPU and GPU, respectively. Thus, our accelerator for RWCNNs is suitable for low-latency inference.Keywords
This publication has 24 references indexed in Scilit:
- YOLO9000: Better, Faster, StrongerPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2017
- Realtime Multi-person 2D Pose Estimation Using Part Affinity FieldsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2017
- Xception: Deep Learning with Depthwise Separable ConvolutionsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2017
- FINNPublished by Association for Computing Machinery (ACM) ,2017
- SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image SegmentationIEEE Transactions on Pattern Analysis and Machine Intelligence, 2017
- SSD: Single Shot MultiBox DetectorPublished by Springer Science and Business Media LLC ,2016
- Identity Mappings in Deep Residual NetworksPublished by Springer Science and Business Media LLC ,2016
- Faster R-CNN: Towards Real-Time Object Detection with Region Proposal NetworksIEEE Transactions on Pattern Analysis and Machine Intelligence, 2016
- Deep Residual Learning for Image RecognitionPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2016
- ImageNet Large Scale Visual Recognition ChallengeInternational Journal of Computer Vision, 2015