Demystifying the Placement Policies of the NVIDIA GPU Thread Block Scheduler for Concurrent Kernels

5 March 2021

journal article
research article
Published by Association for Computing Machinery (ACM) in ACM SIGMETRICS Performance Evaluation Review

Vol. 48 (3), 81-88
https://doi.org/10.1145/3453953.3453972

Abstract

In this work, we empirically derive the scheduler's behavior under concurrent workloads for NVIDIA's Pascal, Volta, and Turing microarchitectures. In contrast to past studies that suggest the scheduler uses a round-robin policy to assign thread blocks to streaming multiprocessors (SMs), we instead find that the scheduler chooses the next SM based on the SM's local resource availability. We show how this scheduling policy can lead to significant, and seemingly counter-intuitive, performance degradation; for example, a decrease of one thread per block resulted in a 3.58X increase in execution time for one kernel in our experiments. We hope that our work will be useful for improving the accuracy of GPU simulators and aid in the development of novel scheduling algorithms.

Keywords

This publication has 12 references indexed in Scilit:

Mosaic
Published by Association for Computing Machinery (ACM) ,2017
Constructing and characterizing covert channels on GPGPUs
Published by Association for Computing Machinery (ACM) ,2017
FLEP
ACM SIGARCH Computer Architecture News, 2017
Warped-Slicer: Efficient Intra-SM Slicing through Dynamic Resource Partitioning for GPU Multiprogramming
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2016
CuMAS
Published by Association for Computing Machinery (ACM) ,2016
Chimera
Published by Association for Computing Machinery (ACM) ,2015
Performance modeling in CUDA streams — A means for high-throughput data processing
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2014
Enabling preemptive multiprogramming on GPUs
ACM SIGARCH Computer Architecture News, 2014
Increasing GPU throughput using kernel interleaved thread block scheduling
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2013
Improving GPGPU concurrency with elastic kernels
Published by Association for Computing Machinery (ACM) ,2013

Cited by 9 articles