Multicore-Optimized Wavefront Diamond Blocking for Optimizing Stencil Updates

1 January 2015

journal article
Published by Society for Industrial & Applied Mathematics (SIAM) in SIAM Journal on Scientific Computing

Vol. 37 (4), C439-C464
https://doi.org/10.1137/140991133

Abstract

The importance of stencil-based algorithms in computational science has focused attention on optimized parallel implementations for multilevel cache-based processors. Temporal blocking schemes leverage the large bandwidth and low latency of caches to accelerate stencil updates and approach theoretical peak performance. A key ingredient is the reduction of data traffic across slow data paths, especially the main memory interface. In this work we combine the ideas of multicore wavefront temporal blocking and diamond tiling to arrive at stencil update schemes that show large reductions in memory pressure compared to existing approaches. The resulting schemes show performance advantages in bandwidth-starved situations, which are exacerbated by the high bytes per lattice update case of variable coefficients. Our thread groups concept provides a controllable trade-off between concurrency and memory usage, shifting the pressure between the memory interface and the CPU. We present performance results on a contemporary Intel processor.

Keywords

This publication has 7 references indexed in Scilit:

The Relation Between Diamond Tiling and Hexagonal Tiling
Parallel Processing Letters, 2014
Hierarchical parallelization and optimization of high-order stencil computations on multicore clusters
The Journal of Supercomputing, 2012
LEVERAGING SHARED CACHES FOR PARALLEL TEMPORAL BLOCKING OF STENCIL CODES ON MULTICORE PROCESSORS AND CLUSTERS
Parallel Processing Letters, 2010
Roofline
Communications of the ACM, 2009
Optimization and Performance Modeling of Stencil Computations on Modern Microprocessors
SIAM Review, 2009
A practical automatic polyhedral parallelizer and locality optimizer
ACM SIGPLAN Notices, 2008
The parallel execution of DO loops
Communications of the ACM, 1974

Cited by 53 articles