Improving Per-Node Computing Efficiency by an Adaptive Lock-Free Scheduling Model
- 1 October 2018
- journal article
- research article
- Published by Institute of Electronics, Information and Communications Engineers (IEICE) in IEICE Transactions on Information and Systems
- Vol. E101.D (10), 2423-2435
- https://doi.org/10.1587/transinf.2018edp7038
Abstract
Job scheduling on many-core computers with tens or even hundreds of processing cores is one of the key technologies in High Performance Computing (HPC) systems. Despite many scheduling algorithms have been proposed, scheduling remains a challenge for executing highly effective jobs that are assigned in a single computing node with diverse scheduling objectives. On the other hand, the increasing scale and the need for rapid response to changing requirements are hard to meet with existing scheduling models in an HPC node. To address these issues, we propose a novel adaptive scheduling model that is applied to a single node with a many-core processor; this model solves the problems of scheduling efficiency and scalability through an adaptive optimistic control mechanism. This mechanism exposes information such that all the cores are provided with jobs and the tools necessary to take advantage of that information and thus compete for resources in an uncoordinated manner. At the same time, the mechanism is equipped with adaptive control, allowing it to adjust the number of running tools dynamically when frequent conflict happens. We justify this scheduling model and present the simulation results for synthetic and real-world HPC workloads, in which we compare our proposed model with two widely used scheduling models, i.e. multi-path monolithic and two-level scheduling. The proposed approach outperforms the other models in scheduling efficiency and scalability. Our results demonstrate that the adaptive optimistic control affords significant improvements for HPC workloads in the parallelism of the node-level scheduling model and performance.Keywords
This publication has 14 references indexed in Scilit:
- Evalix: Classification and Prediction of Job Resource Consumption on HPC PlatformsLecture Notes in Computer Science, 2017
- Do-It-Yourself Virtual Memory TranslationPublished by Association for Computing Machinery (ACM) ,2017
- ArrakisACM Transactions on Computer Systems, 2015
- On the Application Task Granularity and the Interplay with the Scheduling Overhead in Many-Core Shared Memory SystemsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2015
- Exascale computing and big dataCommunications of the ACM, 2015
- Enabling fair pricing on HPC systems with node sharingPublished by Association for Computing Machinery (ACM) ,2013
- Characterization and Comparison of Cloud versus Grid WorkloadsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2012
- Contention-Aware Scheduling on Multicore SystemsACM Transactions on Computer Systems, 2010
- Performance modeling of communication and computation in hybrid MPI and OpenMP applicationsSimulation Modelling Practice and Theory, 2007
- The implementation of the Cilk-5 multithreaded languageACM SIGPLAN Notices, 1998