An I/O Bandwidth-Sensitive Sparse Matrix-Vector Multiplication Engine on FPGAs
- 12 August 2011
- journal article
- Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Circuits and Systems I: Regular Papers
- Vol. 59 (1), 113-123
- https://doi.org/10.1109/tcsi.2011.2161389
Abstract
Sparse matrix-vector multiplication (SMVM) is a fundamental core of many high-performance computing applications, including information retrieval, medical imaging, and economic modeling. While the use of reconfigurable computing technology in a high-performance computing environment has shown recent promise in accelerating a wide variety of scientific applications, existing SMVM architectures on FPGA hardware have been limited in that they require either numerous pipeline stalls during computation (due to zero padding) or excessive input preprocessing during run-time. For large-scale sparse matrix scenarios, both of these shortcomings can result in unacceptable performance overheads, limiting the overall value of using FPGAs in a high-performance computing environment. In this paper, we present a scalable and efficient FPGA-based SMVM architecture which can handle arbitrary matrix sparsity patterns without excessive preprocessing or zero padding and can be dynamically expanded based on the available I/O bandwidth. Our experimental results using a commercial FPGA-based acceleration system demonstrate that our reconfigurable SMVM engine is highly efficient, with benchmark-dependent speedups over an optimized software implementation that range from to in terms of computation time.Keywords
This publication has 21 references indexed in Scilit:
- FPGA vs. GPU for sparse matrix vector multiplyPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2009
- Implementing sparse matrix-vector multiplication on throughput-oriented processorsPublished by Association for Computing Machinery (ACM) ,2009
- High-performance, energy-efficient platforms using in-socket FPGA acceleratorsPublished by Association for Computing Machinery (ACM) ,2009
- An FPGA-specific approach to floating-point accumulation and sum-of-productsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2008
- Sparse matrix computations on manycore GPU'sPublished by Association for Computing Machinery (ACM) ,2008
- High-Performance Reduction Circuits Using Deeply Pipelined Operators on FPGAsIEEE Transactions on Parallel and Distributed Systems, 2007
- Architectures and APIs: Assessing Requirements for Delivering FPGA Performance to ApplicationsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2006
- An optimized Adder Accumulator for high speed MACsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2006
- Improving the memory-system performance of sparse-matrix vector multiplicationIBM Journal of Research and Development, 1997
- Sparse Matrix Computations on Parallel Processor ArraysSIAM Journal on Scientific Computing, 1993