Power-efficient computing for compute-intensive GPGPU applications
- 1 February 2013
- conference paper
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE) in 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA)
Abstract
The peak compute performance of GPUs has been increased by integrating more compute resources and operating them at higher frequency. However, such approaches significantly increase power consumption of GPUs, limiting further performance increase due to the power constraint. Facing such a challenge, we propose three techniques to improve power efficiency and performance of GPUs in this paper. First, we observe that many GPGPU applications are integer-intensive. For such applications, we combine a pair of dependent integer instructions into a composite instruction that can be executed by an enhanced fused multiply-add unit. Second, we observe that computations for many instructions are duplicated across multiple threads. We dynamically detect such instructions and execute them in a separate scalar unit. Finally, we observe that 16 or fewer bits are sufficient for accurate representation of operands and results of many instructions. Thus, we split the 32-bit datapath into two 16-bit datapath slices that can concurrently issue and execute up to two such instructions per cycle. All three proposed techniques can considerably increase utilization of compute resources, improving power efficiency and performance by 20% and 15%, respectively.Keywords
This publication has 16 references indexed in Scilit:
- A compile-time managed multi-level register file hierarchyPublished by Association for Computing Machinery (ACM) ,2011
- Energy-efficient mechanisms for managing thread context in throughput processorsPublished by Association for Computing Machinery (ACM) ,2011
- ERCBench: An Open-Source Benchmark Suite for Embedded and Reconfigurable ComputingPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2010
- Understanding sources of inefficiency in general-purpose chipsPublished by Association for Computing Machinery (ACM) ,2010
- Energy-Efficient Floating-Point Unit DesignInternational Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2010
- Analyzing CUDA workloads using a detailed GPU simulatorPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2009
- Dual-mode floating-point multiplier architectures with parallel operationsJournal of Systems Architecture, 2006
- Register Packing: Exploiting Narrow-Width Operands for Reducing Register File PressurePublished by Institute of Electrical and Electronics Engineers (IEEE) ,2005
- Macro-op scheduling: relaxing scheduling loop constraintsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2004
- Value-based clock gating and operation packingACM Transactions on Computer Systems, 2000