Using processor affinity in loop scheduling on shared-memory multiprocessors

Abstract
The authors consider a new dimension to the problem of loop scheduling on shared-memory multiprocessors: communication overhead caused by accesses to nonlocal data. It is shown that traditional algorithms for loop scheduling, which ignore the location of data when assigning iterations to processors, incur a significant performance penalty on modern shared-memory multiprocessors. The authors propose a loop scheduling algorithm that attempts to simultaneously balance the workload, minimize synchronization, and colocate loop iterations with the necessary data. They compare the performance of this algorithm to that of other known algorithm using four representative applications on a Silicon Graphics multiprocessor workstation, a BBN Butterfly, and a Sequent Symmetry, and they show that the algorithm offers substantial performance improvements, up to a factor of 3 in some cases. They conclude that loop scheduling algorithms for shared-memory multiprocessors cannot afford to ignore the location of data, particularly in light of the increasing disparity between processor and memory speeds.

This publication has 8 references indexed in Scilit: