Olympus: Reaching Memory-Optimality on DNN Processors
- 14 September 2021
- journal article
- research article
- Published by Institute of Electrical and Electronics Engineers (IEEE) in International Conference on Acoustics, Speech, and Signal Processing (ICASSP)
- Vol. 71 (8), 1
- https://doi.org/10.1109/tc.2021.3112262
Abstract
In DNN processors, main memory consumes much more energy than arithmetic operations. Therefore, many memory-oriented network scheduling (MONS) techniques are introduced to exploit on-chip data reuse opportunities and reduce accesses to memory. However, to derive the theoretical lower bound of memory overhead for DNNs is still a significant challenge, which also sheds light on how to reach memory-level optimality by means of network scheduling. Prior work on MONS mainly focused on disparate optimization techniques or missed some of the data reusing opportunities in diverse network models, thus their results are likely to deviate from the true memory-optimality that can be achieved in processors. This paper introduces Olympus, which comprehensively considers the entire memory-level DNN scheduling space, formally analyzes the true memory-optimality and also how to reach the memory-optimal schedules for an arbitrary DNN running on a DNN processor. The key idea behind Olympus is to derive a true memory lower-bound regarding both the intra-layer and inter-layer reuse opportunities, which has not been simultaneously explored by prior works. Evaluation on SOTA DNN processors of different architectures shows that Olympus can guarantee the minimum off-chip memory access, and it reduces 12.3-85.6% DRAM access and saves 7.4-70.3% energy on the latest network models.Keywords
Funding Information
- National Natural Science Foundation of China (61876173)
- Strategic Priority Research Program of Chinese Academy of Science (XDC05030201)
This publication has 24 references indexed in Scilit:
- ImageNet classification with deep convolutional neural networksCommunications of the ACM, 2017
- TETRISPublished by Association for Computing Machinery (ACM) ,2017
- Optimizing Loop Operation and Dataflow in FPGA Acceleration of Deep Convolutional Neural NetworksPublished by Association for Computing Machinery (ACM) ,2017
- Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural NetworksIEEE Journal of Solid-State Circuits, 2016
- Fused-layer CNN acceleratorsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2016
- Deep Residual Learning for Image RecognitionPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2016
- ShiDianNaoPublished by Association for Computing Machinery (ACM) ,2015
- DianNaoACM SIGPLAN Notices, 2014
- Minimizing Computation in Convolutional Neural NetworksLecture Notes in Computer Science, 2014
- I/O complexityPublished by Association for Computing Machinery (ACM) ,1981