Thread clustering
- 21 March 2007
- journal article
- Published by Association for Computing Machinery (ACM) in ACM SIGOPS Operating Systems Review
- Vol. 41 (3), 47-58
- https://doi.org/10.1145/1272998.1273004
Abstract
The major chip manufacturers have all introduced chip multiprocessing (CMP) and simultaneous multithreading (SMT) technology into their processing units. As a result, even low-end computing systems and game consoles have become shared memory multiprocessors with L1 and L2 cache sharing within a chip. Mid- and large-scale systems will have multiple processing chips and hence consist of an SMP-CMP-SMT configuration with non-uniform data sharing overheads. Current operating system schedulers are not aware of these new cache organizations, and as a result, distribute threads across processors in a way that causes many unnecessary, long-latency cross-chip cache accesses. In this paper we describe the design and implementation of a scheme to schedule threads based on sharing patterns detected online using features of standard performance monitoring units (PMUs) available in today's processing units. The primary advantage of using the PMU infrastructure is that it is fine-grained (down to the cache line) and has relatively low overhead. We have implemented our scheme in Linux running on an 8- way Power5 SMP-CMP-SMT multi-processor. For commercial multithreaded server workloads (VolanoMark, SPECjbb, and RUBiS), we are able to demonstrate reductions in cross-chip cache accesses of up to 70%. These reductions lead to application-reported performance improvements of up to 7%.Keywords
This publication has 10 references indexed in Scilit:
- Online performance analysis by statistical sampling of microprocessor performance countersPublished by Association for Computing Machinery (ACM) ,2005
- Scheduling Algorithms for Effective Thread Pairing on Hybrid MultiprocessorsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2005
- Chip multithreading systems need a new operating system schedulerPublished by Association for Computing Machinery (ACM) ,2004
- SEDAPublished by Association for Computing Machinery (ACM) ,2001
- Symbiotic jobscheduling for a simultaneous multithreaded processorPublished by Association for Computing Machinery (ACM) ,2000
- Data clusteringACM Computing Surveys, 1999
- Performance counters and state sharing annotationsPublished by Association for Computing Machinery (ACM) ,1998
- Thread scheduling for cache localityPublished by Association for Computing Machinery (ACM) ,1996
- The Performance Implications of Locality Information Usage in Shared-Memory MultiprocessorsJournal of Parallel and Distributed Computing, 1996
- TreadMarks: shared memory computing on networks of workstationsComputer, 1996