REDUCING THE BULK IN THE BULK SYNCHRONOUS PARALLEL MODEL
- 29 December 2013
- journal article
- Published by World Scientific Pub Co Pte Ltd in Parallel Processing Letters
- Vol. 23 (4), 1340010
- https://doi.org/10.1142/s0129626413400100
Abstract
For over two decades the dominant means for enabling portable performance of computational science and engineering applications on parallel processing architectures has been the bulk-synchronous parallel programming (BSP) model. Code developers, motivated by performance considerations to minimize the number of messages transmitted, have typically pursued a strategy of aggregating message data into fewer, larger messages. Emerging and future high-performance architectures, especially those seen as targeting Exascale capabilities, provide motivation and capabilities for revisiting this approach. In this paper we explore alternative configurations within the context of a large-scale complex multi-physics application and a proxy that represents its behavior, presenting results that demonstrate some important advantages as the number of processors increases in scale.Keywords
This publication has 5 references indexed in Scilit:
- Application‐driven analysis of two generations of capability computing: the transition to multicore processorsConcurrency and Computation: Practice and Experience, 2012
- The scalable process topology interface of MPI 2.2Concurrency and Computation: Practice and Experience, 2010
- Technology-Driven, Highly-Scalable Dragonfly TopologyACM SIGARCH Computer Architecture News, 2008
- The Nas Parallel BenchmarksThe International Journal of Supercomputing Applications, 1991
- A bridging model for parallel computationCommunications of the ACM, 1990