GPU register file virtualization

5 December 2015

conference paper
conference paper
Published by Association for Computing Machinery (ACM)

p. 420-432
https://doi.org/10.1145/2830772.2830784

Abstract

To support massive number of parallel thread contexts, Graphics Processing Units (GPUs) use a huge register file, which is responsible for a large fraction of GPU's total power and area. The conventional belief is that a large register file is inevitable for accommodating more parallel thread contexts, and technology scaling makes it feasible to incorporate ever increasing size of register file. In this paper, we demonstrate that the register file size need not be large to accommodate more threads context. We first characterize the useful lifetime of a register and show that register lifetimes vary drastically across various registers that are allocated to a kernel. While some registers are alive for the entire duration of the kernel execution, some registers have a short lifespan. We propose GPU register file virtualization that allows multiple warps to share physical registers. Since warps may be scheduled for execution at different points in time, we propose to proactively release dead registers from one warp and re-allocate them to a different warp that may occur later in time, thereby reducing the needless demand for physical registers. By using register virtualization, we shrink the architected register space to a smaller physical register space. By under-provisioning the physical register file to be smaller than the architected register file we reduce dynamic and static power consumption. We then develop a new register throttling mechanism to run applications that exceed the size of the under-provisioned register file without any deadlock. Our evaluation shows that even after halving the architected register file size using our proposed GPU register file virtualization applications run successfully with negligible performance overhead.

Keywords

Funding Information

Defense Advanced Research Projects Agency (PERFECT-HR0011-12-2-0020)
National Science Foundation (CAREER-0954211)

This publication has 36 references indexed in Scilit:

Power Modeling for GPU Architectures Using McPAT
ACM Transactions on Design Automation of Electronic Systems, 2014
Warped gates
Published by Association for Computing Machinery (ACM) ,2013
GPUWattch
Published by Association for Computing Machinery (ACM) ,2013
Orchestrated scheduling and prefetching for GPGPUs
Published by Association for Computing Machinery (ACM) ,2013
OWL
Published by Association for Computing Machinery (ACM) ,2013
Performance Analysis and Tuning for General Purpose Graphics Processing Units (GPGPU)
Synthesis Lectures on Computer Architecture, 2012
A compile-time managed multi-level register file hierarchy
Published by Association for Computing Machinery (ACM) ,2011
Improving GPU performance via large warps and two-level warp scheduling
Published by Association for Computing Machinery (ACM) ,2011
Introduction
Published by Springer Science and Business Media LLC ,2003
Software-directed register deallocation for simultaneous multithreaded processors
IEEE Transactions on Parallel and Distributed Systems, 1999

Cited by 61 articles