Generic topology mapping strategies for large-scale parallel architectures
- 31 May 2011
- conference paper
- conference paper
- Published by Association for Computing Machinery (ACM)
Abstract
The steadily increasing number of nodes in high-performance computing systems and the technology and power constraints lead to sparse network topologies. Efficient mapping of application communication patterns to the network topology gains importance as systems grow to petascale and beyond. Such mapping is supported in parallel programming frameworks such as MPI, but is often not well implemented. We show that the topology mapping problem is NP-complete and analyze and compare different practical topology mapping heuristics. We demonstrate an efficient and fast new heuristic which is based on graph similarity and show its utility with application communication patterns on real topologies. Our mapping strategies support heterogeneous networks and show significant reduction of congestion on torus, fat-tree, and the PERCS network topologies, for irregular communication patterns. We also demonstrate that the benefit of topology mapping grows with the network size and show how our algorithms can be used in a practical setting to optimize communication performance. Our efficient topology mapping strategies are shown to reduce network congestion by up to 80%, reduce average dilation by up to 50%, and improve benchmarked communication performance by 18%.Keywords
This publication has 13 references indexed in Scilit:
- The scalable process topology interface of MPI 2.2Concurrency and Computation: Practice and Experience, 2010
- Parallel static and dynamic multi‐constraint graph partitioningConcurrency and Computation: Practice and Experience, 2002
- End-to-end congestion control for the Internet: delays and stabilityIEEE/ACM Transactions on Networking, 2001
- A faster algorithm for betweenness centrality*The Journal of Mathematical Sociology, 2001
- How Good is Recursive Bisection?SIAM Journal on Scientific Computing, 1997
- Heuristic technique for processor and link assignment in multicomputersIEEE Transactions on Computers, 1991
- Threshold accepting: A general purpose optimization algorithm appearing superior to simulated annealingJournal of Computational Physics, 1990
- A Mapping Strategy for Parallel ProcessingIEEE Transactions on Computers, 1987
- On the Mapping ProblemIEEE Transactions on Computers, 1981
- Reducing the bandwidth of sparse symmetric matricesPublished by Association for Computing Machinery (ACM) ,1969