Designing vector-friendly compact BLAS and LAPACK kernels

Publisher Website

12 November 2017

conference paper
conference paper
Published by Association for Computing Machinery (ACM)

Abstract

No abstract available

This publication has 19 references indexed in Scilit:

Revisiting parallel cyclic reduction and parallel prefix-based algorithms for block tridiagonal systems of equations
Journal of Parallel and Distributed Computing, 2013
The Role of Thermal Pressurization and Dilatancy in Controlling the Rate of Fault Slip
Journal of Applied Mechanics, 2012
Roofline
Communications of the ACM, 2009
High-performance implementation of the level-3 BLAS
ACM Transactions on Mathematical Software, 2008
Anatomy of high-performance matrix multiplication
ACM Transactions on Mathematical Software, 2008
Pin
ACM SIGPLAN Notices, 2005
Data-Parallel Line Relaxation Method for the Navier-Stokes Equations
AIAA Journal, 1998
Efficient parallel computation of unstructured finite element reacting flow solutions
Parallel Computing, 1997
A set of level 3 basic linear algebra subprograms
ACM Transactions on Mathematical Software, 1990
Basic Linear Algebra Subprograms for Fortran Usage
ACM Transactions on Mathematical Software, 1979

Cited by 27 articles