Parallel application memory scheduling

3 December 2011

conference paper
conference paper
Published by Association for Computing Machinery (ACM)

p. 362-373
https://doi.org/10.1145/2155620.2155663

Abstract

A primary use of chip-multiprocessor (CMP) systems is to speed up a single application by exploiting thread-level parallelism. In such systems, threads may slow each other down by issuing memory requests that interfere in the shared memory subsystem. This inter-thread memory system interference can significantly degrade parallel application performance. Better memory request scheduling may mitigate such performance degradation. However, previously proposed memory scheduling algorithms for CMPs are designed for multi-programmed workloads where each core runs an independent application, and thus do not take into account the inter-dependent nature of threads in a parallel application. In this paper, we propose a memory scheduling algorithm designed specifically for parallel applications. Our approach has two main components, targeting two common synchronization primitives that cause inter-dependence of threads: locks and barriers. First, the runtime system estimates threads holding the locks that cause the most serialization as the set of limiter threads, which are prioritized by the memory scheduler. Second, the memory scheduler shuffles thread priorities to reduce the time threads take to reach the barrier. We show that our memory scheduler speeds up a set of memory-intensive parallel applications by 12.6% compared to the best previous memory scheduling technique.

Keywords

Funding Information

Division of Computing and Communication Foundations (CAREER Award CCF-0953246)

This publication has 23 references indexed in Scilit:

Self-Optimizing Memory Controllers: A Reinforcement Learning Approach
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2008
QoS policies and architecture for cache/memory in CMP platforms
Published by Association for Computing Machinery (ACM) ,2007
Scheduling threads for constructive cache sharing on CMPs
Published by Association for Computing Machinery (ACM) ,2007
Process Variation Tolerant 3T1D-Based Cache Architectures
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2007
Fair Queuing Memory Systems
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2006
Adaptive History-Based Memory Schedulers
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2005
The Thrifty Barrier: Energy-Aware Synchronization in Shared-Memory Multiprocessors
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2005
Memory access scheduling
Published by Association for Computing Machinery (ACM) ,2000
Dynamic access ordering for streamed computations
IEEE Transactions on Computers, 2000
Gang scheduling performance benefits for fine-grain synchronization
Journal of Parallel and Distributed Computing, 1992

Cited by 90 articles