High Performance Communication on Reconfigurable Clusters

Abstract
FPGA clusters with the FPGAs directly linked through their Multi-Gigabit Transceivers (MGT) have a proven advantage over other commodity architectures for communication-bound applications. To date, however, communication infrastructure for such clusters has generally taken one of two approaches: nearest neighbor only, which is fast but has limited utility, and processor-based, which is general, but relatively slow. What is needed is for communication microarchitecture of these systems to be systematically explored, as has been done for HPC clusters and for Networks on Chip (NoC) on both FPGAs and ASICs. Our first contribution is finding that the properties of clusters of tightly coupled FPGAs substantially influence the router design space. We create a candidate router and generalize it so that it is parameterized by routing algorithm, arbitration policy, and virtual channels (VC). We have created a cycle-accurate simulator validated on a four-FPGA system. We evaluate the design space with respect to a number of standard communication patterns and packet sizes. These results enable selection of the appropriate router for any resource budget. We find that the optimality of the router design varies significantly with workloads. We present a framework that helps to determine appropriate parameters based on different applications and generate the HDL design. We observe that for a 512 FPGA cluster, compared with the router configuration with the best average performance, application-aware router selection can lead to substantial improvement in performance or reduction in area.

This publication has 15 references indexed in Scilit: