REDUCING THE BULK IN THE BULK SYNCHRONOUS PARALLEL MODEL

29 December 2013

journal article
Published by World Scientific Pub Co Pte Ltd in Parallel Processing Letters

Vol. 23 (4), 1340010
https://doi.org/10.1142/s0129626413400100

Abstract

For over two decades the dominant means for enabling portable performance of computational science and engineering applications on parallel processing architectures has been the bulk-synchronous parallel programming (BSP) model. Code developers, motivated by performance considerations to minimize the number of messages transmitted, have typically pursued a strategy of aggregating message data into fewer, larger messages. Emerging and future high-performance architectures, especially those seen as targeting Exascale capabilities, provide motivation and capabilities for revisiting this approach. In this paper we explore alternative configurations within the context of a large-scale complex multi-physics application and a proxy that represents its behavior, presenting results that demonstrate some important advantages as the number of processors increases in scale.

Keywords

This publication has 5 references indexed in Scilit:

Application‐driven analysis of two generations of capability computing: the transition to multicore processors
Concurrency and Computation: Practice and Experience, 2012
The scalable process topology interface of MPI 2.2
Concurrency and Computation: Practice and Experience, 2010
Technology-Driven, Highly-Scalable Dragonfly Topology
ACM SIGARCH Computer Architecture News, 2008
The Nas Parallel Benchmarks
The International Journal of Supercomputing Applications, 1991
A bridging model for parallel computation
Communications of the ACM, 1990

Cited by 8 articles