Refresh Triggered Computation

30 December 2020

journal article
research article
Published by Association for Computing Machinery (ACM) in ACM Transactions on Architecture and Code Optimization

Vol. 18 (1), 1-29
https://doi.org/10.1145/3417708

Abstract

To employ a Convolutional Neural Network (CNN) in an energy-constrained embedded system, it is critical for the CNN implementation to be highly energy efficient. Many recent studies propose CNN accelerator architectures with custom computation units that try to improve the energy efficiency and performance of CNNs by minimizing data transfers from DRAM-based main memory. However, in these architectures, DRAM is still responsible for half of the overall energy consumption of the system, on average. A key factor of the high energy consumption of DRAM is the refresh overhead, which is estimated to consume 40% of the total DRAM energy. In this article, we propose a new mechanism, Refresh Triggered Computation (RTC), that exploits the memory access patterns of CNN applications to reduce the number of refresh operations. RTC uses two major techniques to mitigate the refresh overhead. First, Refresh Triggered Transfer (RTT) is based on our new observation that a CNN application accesses a large portion of the DRAM in a predictable and recurring manner. Thus, the read/write accesses of the application inherently refresh the DRAM, and therefore a significant fraction of refresh operations can be skipped. Second, Partial Array Auto-Refresh (PAAR) eliminates the refresh operations to DRAM regions that do not store any data. We propose three RTC designs (min-RTC, mid-RTC, and full-RTC), each of which requires a different level of aggressiveness in terms of customization to the DRAM subsystem. All of our designs have small overhead. Even the most aggressive RTC design (i.e., full-RTC) imposes an area overhead of only 0.18% in a 16 Gb DRAM chip and can have less overhead for denser chips. Our experimental evaluation on six well-known CNNs shows that RTC reduces average DRAM energy consumption by 24.4% and 61.3% for the least aggressive and the most aggressive RTC implementations, respectively. Besides CNNs, we also evaluate our RTC mechanism on three workloads from other domains. We show that RTC saves 31.9% and 16.9% DRAM energy for Face Recognition and Bayesian Confidence Propagation Neural Network (BCPNN), respectively. We believe RTC can be applied to other applications whose memory access patterns remain predictable for a sufficiently long time.

Keywords

Funding Information

Intel Corporation
Google
Semiconductor Research Corporation
VMware
Microsoft
Alibaba
Facebook
Huawei Technologies

This publication has 89 references indexed in Scilit:

Flexible auto-refresh
Published by Association for Computing Machinery (ACM) ,2015
Refresh pausing in DRAM memory systems
ACM Transactions on Architecture and Code Optimization, 2014
Effects of a Social Robot's Autonomy and Group Orientation on Human Decision-Making
Advances in Human-Computer Interaction, 2013
RAIDR
ACM SIGARCH Computer Architecture News, 2012
Flikker
ACM SIGPLAN Notices, 2011
BFAST: An Alignment Tool for Large Scale Genome Resequencing
PLOS ONE, 2009
Reconfigurable AGU: An Address Generation Unit Based on Address Calculation Pattern for Low Energy and High Performance Embedded Processors
IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, 2009
Gradient-based learning applied to document recognition
Proceedings of the IEEE, 1998
Application of the Karhunen-Loeve procedure for the characterization of human faces
IEEE Transactions on Pattern Analysis and Machine Intelligence, 1990
Identification of common molecular subsequences
Journal of Molecular Biology, 1981

Cited by 3 articles