GPU register file virtualization
- 5 December 2015
- conference paper
- conference paper
- Published by Association for Computing Machinery (ACM)
- p. 420-432
- https://doi.org/10.1145/2830772.2830784
Abstract
To support massive number of parallel thread contexts, Graphics Processing Units (GPUs) use a huge register file, which is responsible for a large fraction of GPU's total power and area. The conventional belief is that a large register file is inevitable for accommodating more parallel thread contexts, and technology scaling makes it feasible to incorporate ever increasing size of register file. In this paper, we demonstrate that the register file size need not be large to accommodate more threads context. We first characterize the useful lifetime of a register and show that register lifetimes vary drastically across various registers that are allocated to a kernel. While some registers are alive for the entire duration of the kernel execution, some registers have a short lifespan. We propose GPU register file virtualization that allows multiple warps to share physical registers. Since warps may be scheduled for execution at different points in time, we propose to proactively release dead registers from one warp and re-allocate them to a different warp that may occur later in time, thereby reducing the needless demand for physical registers. By using register virtualization, we shrink the architected register space to a smaller physical register space. By under-provisioning the physical register file to be smaller than the architected register file we reduce dynamic and static power consumption. We then develop a new register throttling mechanism to run applications that exceed the size of the under-provisioned register file without any deadlock. Our evaluation shows that even after halving the architected register file size using our proposed GPU register file virtualization applications run successfully with negligible performance overhead.Keywords
Funding Information
- Defense Advanced Research Projects Agency (PERFECT-HR0011-12-2-0020)
- National Science Foundation (CAREER-0954211)
This publication has 36 references indexed in Scilit:
- Power Modeling for GPU Architectures Using McPATACM Transactions on Design Automation of Electronic Systems, 2014
- Warped gatesPublished by Association for Computing Machinery (ACM) ,2013
- GPUWattchPublished by Association for Computing Machinery (ACM) ,2013
- Orchestrated scheduling and prefetching for GPGPUsPublished by Association for Computing Machinery (ACM) ,2013
- OWLPublished by Association for Computing Machinery (ACM) ,2013
- Performance Analysis and Tuning for General Purpose Graphics Processing Units (GPGPU)Synthesis Lectures on Computer Architecture, 2012
- A compile-time managed multi-level register file hierarchyPublished by Association for Computing Machinery (ACM) ,2011
- Improving GPU performance via large warps and two-level warp schedulingPublished by Association for Computing Machinery (ACM) ,2011
- IntroductionPublished by Springer Science and Business Media LLC ,2003
- Software-directed register deallocation for simultaneous multithreaded processorsIEEE Transactions on Parallel and Distributed Systems, 1999