Optimization of Collective Communication Operations in MPICH

Top Cited Papers

1 February 2005

journal article
Published by SAGE Publications in The International Journal of High Performance Computing Applications

Vol. 19 (1), 49-66
https://doi.org/10.1177/1094342005051521

Abstract

We describe our work on improving the performance of collective communication operations in MPICH for clusters connected by switched networks. For each collective operation, we use multiple algorithms depending on the message size, with the goal of minimizing latency for short messages and minimizing bandwidth use for long messages. Although we have implemented new algorithms for all MPI (Message Passing Interface) collective operations, because of limited space we describe only the algorithms for allgather, broadcast, all-to-all, reduce-scatter, reduce, and allreduce. Performance results on a Myrinet-connected Linux cluster and an IBM SP indicate that, in all cases, the new algorithms significantly outperform the old algorithms used in MPICH on the Myrinet cluster, and, in many cases, they outperform the algorithms used in IBM's MPI on the SP. We also explore in further detail the optimization of two of the most commonly used collective operations, allreduce and reduce, particularly for long messages and nonpower-of-two numbers of processes. The optimized algorithms for these operations perform several times better than the native algorithms on a Myrinet cluster, IBM SP, and Cray T3E. Our results indicate that to achieve the best performance for a collective communication operation, one needs to use a number of different algorithms and select the right algorithm for a particular message size and number of processes.

Keywords

This publication has 11 references indexed in Scilit:

Communication and Optimization Aspects of Parallel Programming Models on Hybrid Architectures
The International Journal of High Performance Computing Applications, 2003
A Comparison of MPICH Allgather Algorithms on Switched Networks
Lecture Notes in Computer Science, 2003
Exploiting hierarchy in parallel computer networks to optimize collective operation performance
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2002
The Hierarchical Factor Algorithm for All-to-All Communication
Lecture Notes in Computer Science, 2002
Automatically Tuned Collective Communications
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2000
LogGP: Incorporating Long Messages into the LogP Model for Parallel Computation
Journal of Parallel and Distributed Computing, 1997
Efficient algorithms for the reduce-scatter operation in LogGP
IEEE Transactions on Parallel and Distributed Systems, 1997
Efficient algorithms for all-to-all communications in multiport message-passing systems
IEEE Transactions on Parallel and Distributed Systems, 1997
The communication challenge for MPP: Intel Paragon and Meiko CS-2
Parallel Computing, 1994
Two algorithms for barrier synchronization
International Journal of Parallel Programming, 1988

Cited by 574 articles