Power-efficient computing for compute-intensive GPGPU applications

conference paper
conference paper
Published by Institute of Electrical and Electronics Engineers (IEEE) in 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA)

p. 330-341
https://doi.org/10.1109/hpca.2013.6522330

Abstract

The peak compute performance of GPUs has been increased by integrating more compute resources and operating them at higher frequency. However, such approaches significantly increase power consumption of GPUs, limiting further performance increase due to the power constraint. Facing such a challenge, we propose three techniques to improve power efficiency and performance of GPUs in this paper. First, we observe that many GPGPU applications are integer-intensive. For such applications, we combine a pair of dependent integer instructions into a composite instruction that can be executed by an enhanced fused multiply-add unit. Second, we observe that computations for many instructions are duplicated across multiple threads. We dynamically detect such instructions and execute them in a separate scalar unit. Finally, we observe that 16 or fewer bits are sufficient for accurate representation of operands and results of many instructions. Thus, we split the 32-bit datapath into two 16-bit datapath slices that can concurrently issue and execute up to two such instructions per cycle. All three proposed techniques can considerably increase utilization of compute resources, improving power efficiency and performance by 20% and 15%, respectively.

Keywords

This publication has 16 references indexed in Scilit:

A compile-time managed multi-level register file hierarchy
Published by Association for Computing Machinery (ACM) ,2011
Energy-efficient mechanisms for managing thread context in throughput processors
Published by Association for Computing Machinery (ACM) ,2011
ERCBench: An Open-Source Benchmark Suite for Embedded and Reconfigurable Computing
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2010
Understanding sources of inefficiency in general-purpose chips
Published by Association for Computing Machinery (ACM) ,2010
Energy-Efficient Floating-Point Unit Design
International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2010
Analyzing CUDA workloads using a detailed GPU simulator
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2009
Dual-mode floating-point multiplier architectures with parallel operations
Journal of Systems Architecture, 2006
Register Packing: Exploiting Narrow-Width Operands for Reducing Register File Pressure
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2005
Macro-op scheduling: relaxing scheduling loop constraints
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2004
Value-based clock gating and operation packing
ACM Transactions on Computer Systems, 2000

Cited by 38 articles