CORF
- 4 April 2019
- conference paper
- conference paper
- Published by Association for Computing Machinery (ACM)
Abstract
The Register File (RF) in GPUs is a critical structure that maintains the state for thousands of threads that support the GPU processing model. The RF organization substantially affects the overall performance and the energy efficiency of a GPU. For example, the frequent accesses to the RF consume a substantial amount of the dynamic energy, and port contention due to limited ports on operand collectors and register file banks affect performance as register operations are serialized. We present CORF, a compiler-assisted Coalescing Operand Register File which performs register coalescing by combining reads to multiple registers required by a single instruction, into a single physical read. To enable register coalescing, CORF utilizes register packing to co-locate narrow-width operands in the same physical register. CORF uses compiler hints to identify which register pairs are commonly accessed together. CORF saves dynamic energy by reducing the number of physical register file accesses, and improves performance by combining read operations, as well as by reducing pressure on the register file. To increase the coalescing opportunities, we re-architect the physical register file to allow coalescing reads across different physical registers that reside in mutually exclusive sub-banks; we call this design CORF++. The compiler analysis for register allocation for CORF++ becomes a form of graph coloring called the bipartite edge frustration problem. CORF++ reduces the dynamic energy of the RF by 17%, and improves IPC by 9%.Keywords
Funding Information
- NS (CNS-1619450)
- NSF (CNS-1422401, CNS-1619322)
This publication has 36 references indexed in Scilit:
- A case for core-assisted bottleneck acceleration in GPUsPublished by Association for Computing Machinery (ACM) ,2015
- Warped-compressionPublished by Association for Computing Machinery (ACM) ,2015
- A Survey Of Architectural Approaches for Data Compression in Cache and Main Memory SystemsIEEE Transactions on Parallel and Distributed Systems, 2015
- Equalizer: Dynamic Tuning of GPU Resources for Efficient ExecutionPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2014
- A Survey of Methods for Analyzing and Improving GPU Energy EfficiencyACM Computing Surveys, 2014
- GPUWattchACM SIGARCH Computer Architecture News, 2013
- Power-efficient computing for compute-intensive GPGPU applicationsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2013
- Energy-efficient mechanisms for managing thread context in throughput processorsACM SIGARCH Computer Architecture News, 2011
- Optimistic register coalescingACM Transactions on Programming Languages and Systems, 2004
- Iterated register coalescingACM Transactions on Programming Languages and Systems, 1996