Energy-efficient mechanisms for managing thread context in throughput processors
- 4 June 2011
- journal article
- conference paper
- Published by Association for Computing Machinery (ACM) in ACM SIGARCH Computer Architecture News
- Vol. 39 (3), 235-246
- https://doi.org/10.1145/2024723.2000093
Abstract
Modern graphics processing units (GPUs) use a large number of hardware threads to hide both function unit and memory access latency. Extreme multithreading requires a complicated thread scheduler as well as a large register file, which is expensive to access both in terms of energy and latency. We present two complementary techniques for reducing energy on massively-threaded processors such as GPUs. First, we examine register file caching to replace accesses to the large main register file with accesses to a smaller structure containing the immediate register working set of active threads. Second, we investigate a two-level thread scheduler that maintains a small set of active threads to hide ALU and local memory access latency and a larger set of pending threads to hide main memory latency. Combined with register file caching, a two-level thread scheduler provides a further reduction in energy by limiting the allocation of temporary register cache resources to only the currently active subset of threads. We show that on average, across a variety of real world graphics and compute workloads, a 6-entry per-thread register file cache reduces the number of reads and writes to the main register file by 50% and 59% respectively. We further show that the active thread count can be reduced by a factor of 4 with minimal impact on performance, resulting in a 36% reduction of register file energy.Keywords
This publication has 18 references indexed in Scilit:
- Energy-efficient register caching with compiler assistanceACM Transactions on Architecture and Code Optimization, 2009
- A closer look at GPUsCommunications of the ACM, 2008
- LarrabeePublished by Association for Computing Machinery (ACM) ,2008
- Register file caching for energy efficiencyPublished by Association for Computing Machinery (ACM) ,2006
- Niagara: A 32-Way Multithreaded Sparc ProcessorIEEE Micro, 2005
- A survey of processors with explicit multithreadingACM Computing Surveys, 2003
- Multiple-banked register file architecturesPublished by Association for Computing Machinery (ACM) ,2000
- The Tera computer systemPublished by Association for Computing Machinery (ACM) ,1990
- APRILPublished by Association for Computing Machinery (ACM) ,1990
- Hierarchical registers for scientific computersPublished by Association for Computing Machinery (ACM) ,1988