Improving Per-Node Computing Efficiency by an Adaptive Lock-Free Scheduling Model

Abstract
Job scheduling on many-core computers with tens or even hundreds of processing cores is one of the key technologies in High Performance Computing (HPC) systems. Despite many scheduling algorithms have been proposed, scheduling remains a challenge for executing highly effective jobs that are assigned in a single computing node with diverse scheduling objectives. On the other hand, the increasing scale and the need for rapid response to changing requirements are hard to meet with existing scheduling models in an HPC node. To address these issues, we propose a novel adaptive scheduling model that is applied to a single node with a many-core processor; this model solves the problems of scheduling efficiency and scalability through an adaptive optimistic control mechanism. This mechanism exposes information such that all the cores are provided with jobs and the tools necessary to take advantage of that information and thus compete for resources in an uncoordinated manner. At the same time, the mechanism is equipped with adaptive control, allowing it to adjust the number of running tools dynamically when frequent conflict happens. We justify this scheduling model and present the simulation results for synthetic and real-world HPC workloads, in which we compare our proposed model with two widely used scheduling models, i.e. multi-path monolithic and two-level scheduling. The proposed approach outperforms the other models in scheduling efficiency and scalability. Our results demonstrate that the adaptive optimistic control affords significant improvements for HPC workloads in the parallelism of the node-level scheduling model and performance.

This publication has 14 references indexed in Scilit: