High-Performance Reduction Circuits Using Deeply Pipelined Operators on FPGAs

17 September 2007

journal article
Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Parallel and Distributed Systems

Vol. 18 (10), 1377-1392
https://doi.org/10.1109/tpds.2007.1068

Abstract

Field-programmable gate arrays (FPGAs) have become an attractive option for accelerating scientific applications. Many scientific operations such as matrix-vector multiplication and dot product involve the reduction of a sequentially produced stream of values. Unfortunately, because of the pipelining in FPGA-based floating-point units, data hazards may occur during these sequential reduction operations. Improperly designed reduction circuits can adversely impact the performance, impose unrealistic buffer requirements, and consume a significant portion of the FPGA. In this paper, we identify two basic methods for designing serial reduction circuits: the tree-traversal method and the striding method. Using accumulation as an example, we analyze the design trade-offs among the number of adders, buffer size, and latency. We then propose high-performance and area-efficient designs using each method. The proposed designs reduce multiple sets of sequentially delivered floating-point values without stalling the pipeline or imposing unrealistic buffer requirements. Using a Xilinx Virtex-ll Pro FPGA as the target device, we implemented our designs and present performance and area results.

Keywords

This publication has 6 references indexed in Scilit:

Advanced Components in the Variable Precision Floating-Point Library
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2006
A Hybrid Approach for Mapping Conjugate Gradient onto an FPGA-Augmented Reconfigurable Supercomputer
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2006
An FPGA-Based Application-Specific Processor for Efficient Reduction of Multiple Variable-Length Floating-Point Data Sets
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2006
Sparse Matrix-Vector multiplication on FPGAs
Published by Association for Computing Machinery (ACM) ,2005
Closing the Gap: CPU and FPGA Trends in Sustainable Floating-Point BLAS Performance
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2004
Vector reduction methods for arithmetic pipelines
Published by Institute of Electrical and Electronics Engineers (IEEE) ,1983

Cited by 64 articles