Efficient Block Algorithms for Parallel Sparse Triangular Solve

Abstract

The sparse triangular solve (SpTRSV) kernel is an important building block for a number of linear algebra routines such as sparse direct and iterative solvers. The major challenge of accelerating SpTRSV lies in the difficulties of finding higher parallelism. Existing work mainly focuses on reducing dependencies and synchronizations in the level-set methods. However, the 2D block layout of the input matrix has been largely ignored in designing more efficient SpTRSV algorithms. In this paper, we implement three block algorithms, i.e., column block, row block and recursive block algorithms, for parallel SpTRSV on modern GPUs, and propose an adaptive approach that can automatically select the best kernels according to input sparsity structures. By testing 159 sparse matrices on two high-end NVIDIA GPUs, the experimental results demonstrate that the recursive block algorithm has the best performance among the three block algorithms, and it is on average 4.72x (up to 72.03x) and 9.95x (up to 61.08x) faster than cuSPARSE v2 and Sync-free methods, respectively. Besides, our method merely needs moderate cost for preprocessing the input matrix, thus is highly efficient for multiple right-hand sides and iterative scenarios.

Keywords

This publication has 60 references indexed in Scilit:

Analysis of dynamically scheduled tile algorithms for dense linear algebra on multicore architectures
Concurrency and Computation: Practice and Experience, 2011
On the Block Triangular Form of Symmetric Matrices
SIAM Review, 2010
Parallel algorithms for solving linear systems with sparse triangular matrices
Computing, 2009
Numerical linear algebra on emerging architectures: The PLASMA and MAGMA projects
Journal of Physics: Conference Series, 2009
A class of parallel tiled linear algebra algorithms for multicore architectures
Parallel Computing, 2008
Parallel tiled QR factorization for multicore architectures
Concurrency and Computation: Practice and Experience, 2008
Performance Optimization and Modeling of Blocked Sparse Kernels
The International Journal of High Performance Computing Applications, 2007
Partitioning Sparse Matrices for Parallel Preconditioned Iterative Methods
SIAM Journal on Scientific Computing, 2007
An overview of SuperLU
ACM Transactions on Mathematical Software, 2005
SOLVING SPARSE TRIANGULAR LINEAR SYSTEMS ON PARALLEL COMPUTERS
International Journal of High Speed Computing, 1989

Cited by 9 articles