High-Performance Reduction Circuits Using Deeply Pipelined Operators on FPGAs
- 17 September 2007
- journal article
- Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Parallel and Distributed Systems
- Vol. 18 (10), 1377-1392
- https://doi.org/10.1109/tpds.2007.1068
Abstract
Field-programmable gate arrays (FPGAs) have become an attractive option for accelerating scientific applications. Many scientific operations such as matrix-vector multiplication and dot product involve the reduction of a sequentially produced stream of values. Unfortunately, because of the pipelining in FPGA-based floating-point units, data hazards may occur during these sequential reduction operations. Improperly designed reduction circuits can adversely impact the performance, impose unrealistic buffer requirements, and consume a significant portion of the FPGA. In this paper, we identify two basic methods for designing serial reduction circuits: the tree-traversal method and the striding method. Using accumulation as an example, we analyze the design trade-offs among the number of adders, buffer size, and latency. We then propose high-performance and area-efficient designs using each method. The proposed designs reduce multiple sets of sequentially delivered floating-point values without stalling the pipeline or imposing unrealistic buffer requirements. Using a Xilinx Virtex-ll Pro FPGA as the target device, we implemented our designs and present performance and area results.Keywords
This publication has 6 references indexed in Scilit:
- Advanced Components in the Variable Precision Floating-Point LibraryPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2006
- A Hybrid Approach for Mapping Conjugate Gradient onto an FPGA-Augmented Reconfigurable SupercomputerPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2006
- An FPGA-Based Application-Specific Processor for Efficient Reduction of Multiple Variable-Length Floating-Point Data SetsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2006
- Sparse Matrix-Vector multiplication on FPGAsPublished by Association for Computing Machinery (ACM) ,2005
- Closing the Gap: CPU and FPGA Trends in Sustainable Floating-Point BLAS PerformancePublished by Institute of Electrical and Electronics Engineers (IEEE) ,2004
- Vector reduction methods for arithmetic pipelinesPublished by Institute of Electrical and Electronics Engineers (IEEE) ,1983