Divergence-aware warp scheduling
- 7 December 2013
- conference paper
- conference paper
- Published by Association for Computing Machinery (ACM) in Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture - MICRO-46
Abstract
This paper uses hardware thread scheduling to improve the performance and energy efficiency of divergent applications on GPUs. We propose Divergence-Aware Warp Scheduling (DAWS), which introduces a divergence-based cache footprint predictor to estimate how much L1 data cache capacity is needed to capture intra-warp locality in loops. Predictor estimates are created from an online characterization of memory divergence and runtime information about the level of control flow divergence in warps. Unlike prior work on Cache-Conscious Wavefront Scheduling, which makes reactive scheduling decisions based on detected cache thrashing, DAWS makes proactive scheduling decisions based on cache usage predictions. DAWS uses these predictions to schedule warps such that data reused by active scalar threads is unlikely to exceed the capacity of the L1 data cache. DAWS attempts to shift the burden of locality management from software to hardware, increasing the performance of simpler and more portable code on the GPU. We compare the execution time of two Sparse Matrix Vector Multiply implementations and show that DAWS is able to run a simple, divergent version within 4% of a performance optimized version that has been rewritten to make use of the on-chip scratchpad and have less memory divergence. We show that DAWS achieves a harmonic mean 26% performance improvement over Cache-Conscious Wavefront Scheduling on a diverse selection of highly cache-sensitive applications, with minimal additional hardware.Keywords
Funding Information
- Natural Sciences and Engineering Research Council of Canada
- Nvidia
This publication has 27 references indexed in Scilit:
- GPUWattchACM SIGARCH Computer Architecture News, 2013
- Orchestrated scheduling and prefetching for GPGPUsACM SIGARCH Computer Architecture News, 2013
- CRUISEACM SIGARCH Computer Architecture News, 2012
- Dark silicon and the end of multicore scalingACM SIGARCH Computer Architecture News, 2011
- Energy-efficient mechanisms for managing thread context in throughput processorsACM SIGARCH Computer Architecture News, 2011
- Accelerating CUDA graph algorithms at maximum warpACM SIGPLAN Notices, 2011
- Dynamic warp subdivision for integrated branch and memory divergence toleranceACM SIGARCH Computer Architecture News, 2010
- High performance cache replacement using re-reference interval prediction (RRIP)ACM SIGARCH Computer Architecture News, 2010
- Adaptive insertion policies for high performance cachingACM SIGARCH Computer Architecture News, 2007
- Multiscalar processorsACM SIGARCH Computer Architecture News, 1995