Refresh Triggered Computation
- 30 December 2020
- journal article
- research article
- Published by Association for Computing Machinery (ACM) in ACM Transactions on Architecture and Code Optimization
- Vol. 18 (1), 1-29
- https://doi.org/10.1145/3417708
Abstract
To employ a Convolutional Neural Network (CNN) in an energy-constrained embedded system, it is critical for the CNN implementation to be highly energy efficient. Many recent studies propose CNN accelerator architectures with custom computation units that try to improve the energy efficiency and performance of CNNs by minimizing data transfers from DRAM-based main memory. However, in these architectures, DRAM is still responsible for half of the overall energy consumption of the system, on average. A key factor of the high energy consumption of DRAM is the refresh overhead, which is estimated to consume 40% of the total DRAM energy. In this article, we propose a new mechanism, Refresh Triggered Computation (RTC), that exploits the memory access patterns of CNN applications to reduce the number of refresh operations. RTC uses two major techniques to mitigate the refresh overhead. First, Refresh Triggered Transfer (RTT) is based on our new observation that a CNN application accesses a large portion of the DRAM in a predictable and recurring manner. Thus, the read/write accesses of the application inherently refresh the DRAM, and therefore a significant fraction of refresh operations can be skipped. Second, Partial Array Auto-Refresh (PAAR) eliminates the refresh operations to DRAM regions that do not store any data. We propose three RTC designs (min-RTC, mid-RTC, and full-RTC), each of which requires a different level of aggressiveness in terms of customization to the DRAM subsystem. All of our designs have small overhead. Even the most aggressive RTC design (i.e., full-RTC) imposes an area overhead of only 0.18% in a 16 Gb DRAM chip and can have less overhead for denser chips. Our experimental evaluation on six well-known CNNs shows that RTC reduces average DRAM energy consumption by 24.4% and 61.3% for the least aggressive and the most aggressive RTC implementations, respectively. Besides CNNs, we also evaluate our RTC mechanism on three workloads from other domains. We show that RTC saves 31.9% and 16.9% DRAM energy for Face Recognition and Bayesian Confidence Propagation Neural Network (BCPNN), respectively. We believe RTC can be applied to other applications whose memory access patterns remain predictable for a sufficiently long time.Keywords
Funding Information
- Intel Corporation
- Semiconductor Research Corporation
- VMware
- Microsoft
- Alibaba
- Huawei Technologies
This publication has 89 references indexed in Scilit:
- Flexible auto-refreshPublished by Association for Computing Machinery (ACM) ,2015
- Refresh pausing in DRAM memory systemsACM Transactions on Architecture and Code Optimization, 2014
- Effects of a Social Robot's Autonomy and Group Orientation on Human Decision-MakingAdvances in Human-Computer Interaction, 2013
- RAIDRACM SIGARCH Computer Architecture News, 2012
- FlikkerACM SIGPLAN Notices, 2011
- BFAST: An Alignment Tool for Large Scale Genome ResequencingPLOS ONE, 2009
- Reconfigurable AGU: An Address Generation Unit Based on Address Calculation Pattern for Low Energy and High Performance Embedded ProcessorsIEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, 2009
- Gradient-based learning applied to document recognitionProceedings of the IEEE, 1998
- Application of the Karhunen-Loeve procedure for the characterization of human facesIEEE Transactions on Pattern Analysis and Machine Intelligence, 1990
- Identification of common molecular subsequencesJournal of Molecular Biology, 1981