PAMI: A Parallel Active Message Interface for the Blue Gene/Q Supercomputer
- 1 May 2012
- conference paper
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
- p. 763-773
- https://doi.org/10.1109/ipdps.2012.73
Abstract
The Blue Gene/Q machine is the next generation in the line of IBM massively parallel supercomputers, designed to scale to 262144 nodes and sixteen million threads. With each BG/Q node having 68 hardware threads, hybrid programming paradigms, which use message passing among nodes and multi-threading within nodes, are ideal and will enable applications to achieve high throughput on BG/Q. With such unprecedented massive parallelism and scale, this paper is a groundbreaking effort to explore the design challenges for designing a communication library that can match and exploit such massive parallelism In particular, we present the Parallel Active Messaging Interface (PAMI) library as our BG/Q library solution to the many challenges that come with a machine at such scale. PAMI provides (1) novel techniques to partition the application communication overhead into many contexts that can be accelerated by communication threads, (2) client and context objects to support multiple and different programming paradigms, (3) lockless algorithms to speed up MPI message rate, and (4) novel techniques leveraging the new BG/Q architectural features such as the scalable atomic primitives implemented in the L2 cache, the highly parallel hardware messaging unit that supports both point-to-point and collective operations, and the collective hardware acceleration for operations such as broadcast, reduce, and all reduce. We experimented with PAMI on 2048 BG/Q nodes and the results show high messaging rates as well as low latencies and high throughputs for collective communication operations.Keywords
This publication has 9 references indexed in Scilit:
- Minimizing MPI Resource Contention in Multithreaded Multicore EnvironmentsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2010
- Hybrid parallel programming with MPI and unified parallel CPublished by Association for Computing Machinery (ACM) ,2010
- Architecture of the Component Collective Messaging InterfaceThe International Journal of High Performance Computing Applications, 2010
- The deep computing messaging frameworkPublished by Association for Computing Machinery (ACM) ,2008
- Data Transfers between Processes in an SMP System: Performance Study and Application to MPIPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2006
- Overview of the Blue Gene/L system architectureIBM Journal of Research and Development, 2005
- MPI-LAPI: an efficient implementation of MPI for IBM RS/6000 SP systemsIEEE Transactions on Parallel and Distributed Systems, 2001
- ARMCI: A portable remote memory copy library for distributed array libraries and compiler run-time systemsLecture Notes in Computer Science, 1999
- A high-performance, portable implementation of the MPI message passing interface standardParallel Computing, 1996