Demystifying the Placement Policies of the NVIDIA GPU Thread Block Scheduler for Concurrent Kernels
- 5 March 2021
- journal article
- research article
- Published by Association for Computing Machinery (ACM) in ACM SIGMETRICS Performance Evaluation Review
- Vol. 48 (3), 81-88
- https://doi.org/10.1145/3453953.3453972
Abstract
In this work, we empirically derive the scheduler's behavior under concurrent workloads for NVIDIA's Pascal, Volta, and Turing microarchitectures. In contrast to past studies that suggest the scheduler uses a round-robin policy to assign thread blocks to streaming multiprocessors (SMs), we instead find that the scheduler chooses the next SM based on the SM's local resource availability. We show how this scheduling policy can lead to significant, and seemingly counter-intuitive, performance degradation; for example, a decrease of one thread per block resulted in a 3.58X increase in execution time for one kernel in our experiments. We hope that our work will be useful for improving the accuracy of GPU simulators and aid in the development of novel scheduling algorithms.Keywords
This publication has 12 references indexed in Scilit:
- MosaicPublished by Association for Computing Machinery (ACM) ,2017
- Constructing and characterizing covert channels on GPGPUsPublished by Association for Computing Machinery (ACM) ,2017
- FLEPACM SIGARCH Computer Architecture News, 2017
- Warped-Slicer: Efficient Intra-SM Slicing through Dynamic Resource Partitioning for GPU MultiprogrammingPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2016
- CuMASPublished by Association for Computing Machinery (ACM) ,2016
- ChimeraPublished by Association for Computing Machinery (ACM) ,2015
- Performance modeling in CUDA streams — A means for high-throughput data processingPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2014
- Enabling preemptive multiprogramming on GPUsACM SIGARCH Computer Architecture News, 2014
- Increasing GPU throughput using kernel interleaved thread block schedulingPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2013
- Improving GPGPU concurrency with elastic kernelsPublished by Association for Computing Machinery (ACM) ,2013