Enhancing the Scalability of Multi-FPGA Stencil Computations via Highly Optimized HDL Components
- 12 August 2021
- journal article
- research article
- Published by Association for Computing Machinery (ACM) in ACM Transactions on Reconfigurable Technology and Systems
- Vol. 14 (3), 1-33
- https://doi.org/10.1145/3461478
Abstract
Stencil-based algorithms are a relevant class of computational kernels in high-performance systems, as they appear in a plethora of fields, from image processing to seismic simulations, from numerical methods to physical modeling. Among the various incarnations of stencil-based computations, Iterative Stencil Loops (ISLs) and Convolutional Neural Networks (CNNs) represent two well-known examples of kernels belonging to the stencil class. Indeed, ISLs apply the same stencil several times until convergence, while CNN layers leverage stencils to extract features from an image. The computationally intensive essence of ISLs, CNNs, and in general stencil-based workloads, requires solutions able to produce efficient implementations in terms of throughput and power efficiency. In this context, FPGAs are ideal candidates for such workloads, as they allow design architectures tailored to the stencil regular computational pattern. Moreover, the ever-growing need for performance enhancement leads FPGA-based architectures to scale to multiple devices to benefit from a distributed acceleration. For this reason, we propose a library of HDL components to effectively compute ISLs and CNNs inference on FPGA, along with a scalable multi-FPGA architecture, based on custom PCB interconnects. Our solution eases the design flow and guarantees both scalability and performance competitive with state-of-the-art works.Keywords
This publication has 66 references indexed in Scilit:
- Nonequilibrium molecular dynamics simulation of shear viscosity by a uniform momentum source-and-sink schemeJournal of Computational Physics, 2012
- Edge detection insensitive to changes of illumination in the imageImage and Vision Computing, 2010
- Lagrange Multiplier Approach with Optimized Finite Difference Stencils for Pricing American Options under Stochastic VolatilitySIAM Journal on Scientific Computing, 2009
- A practical automatic polyhedral parallelizer and locality optimizerACM SIGPLAN Notices, 2008
- A stencil adaptive algorithm for finite difference solution of incompressible viscous flowsJournal of Computational Physics, 2006
- A Jacobi--Davidson Iteration Method for Linear Eigenvalue ProblemsSIAM Review, 2000
- Gradient-based learning applied to document recognitionProceedings of the IEEE, 1998
- A finite‐volume, incompressible Navier Stokes model for studies of the ocean on parallel computersJournal of Geophysical Research: Oceans, 1997
- Multiresolution molecular dynamics algorithm for realistic materials modeling on parallel computersComputer Physics Communications, 1994
- Computation theory of cellular automataCommunications in Mathematical Physics, 1984