Enhancing the Scalability of Multi-FPGA Stencil Computations via Highly Optimized HDL Components

12 August 2021

journal article
research article
Published by Association for Computing Machinery (ACM) in ACM Transactions on Reconfigurable Technology and Systems

Vol. 14 (3), 1-33
https://doi.org/10.1145/3461478

Abstract

Stencil-based algorithms are a relevant class of computational kernels in high-performance systems, as they appear in a plethora of fields, from image processing to seismic simulations, from numerical methods to physical modeling. Among the various incarnations of stencil-based computations, Iterative Stencil Loops (ISLs) and Convolutional Neural Networks (CNNs) represent two well-known examples of kernels belonging to the stencil class. Indeed, ISLs apply the same stencil several times until convergence, while CNN layers leverage stencils to extract features from an image. The computationally intensive essence of ISLs, CNNs, and in general stencil-based workloads, requires solutions able to produce efficient implementations in terms of throughput and power efficiency. In this context, FPGAs are ideal candidates for such workloads, as they allow design architectures tailored to the stencil regular computational pattern. Moreover, the ever-growing need for performance enhancement leads FPGA-based architectures to scale to multiple devices to benefit from a distributed acceleration. For this reason, we propose a library of HDL components to effectively compute ISLs and CNNs inference on FPGA, along with a scalable multi-FPGA architecture, based on custom PCB interconnects. Our solution eases the design flow and guarantees both scalability and performance competitive with state-of-the-art works.

Keywords

This publication has 66 references indexed in Scilit:

Nonequilibrium molecular dynamics simulation of shear viscosity by a uniform momentum source-and-sink scheme
Journal of Computational Physics, 2012
Edge detection insensitive to changes of illumination in the image
Image and Vision Computing, 2010
Lagrange Multiplier Approach with Optimized Finite Difference Stencils for Pricing American Options under Stochastic Volatility
SIAM Journal on Scientific Computing, 2009
A practical automatic polyhedral parallelizer and locality optimizer
ACM SIGPLAN Notices, 2008
A stencil adaptive algorithm for finite difference solution of incompressible viscous flows
Journal of Computational Physics, 2006
A Jacobi--Davidson Iteration Method for Linear Eigenvalue Problems
SIAM Review, 2000
Gradient-based learning applied to document recognition
Proceedings of the IEEE, 1998
A finite‐volume, incompressible Navier Stokes model for studies of the ocean on parallel computers
Journal of Geophysical Research: Oceans, 1997
Multiresolution molecular dynamics algorithm for realistic materials modeling on parallel computers
Computer Physics Communications, 1994
Computation theory of cellular automata
Communications in Mathematical Physics, 1984

Cited by 10 articles