Designing vector-friendly compact BLAS and LAPACK kernels
- 12 November 2017
- conference paper
- conference paper
- Published by Association for Computing Machinery (ACM)
Abstract
No abstract availableThis publication has 19 references indexed in Scilit:
- Revisiting parallel cyclic reduction and parallel prefix-based algorithms for block tridiagonal systems of equationsJournal of Parallel and Distributed Computing, 2013
- The Role of Thermal Pressurization and Dilatancy in Controlling the Rate of Fault SlipJournal of Applied Mechanics, 2012
- RooflineCommunications of the ACM, 2009
- High-performance implementation of the level-3 BLASACM Transactions on Mathematical Software, 2008
- Anatomy of high-performance matrix multiplicationACM Transactions on Mathematical Software, 2008
- PinACM SIGPLAN Notices, 2005
- Data-Parallel Line Relaxation Method for the Navier-Stokes EquationsAIAA Journal, 1998
- Efficient parallel computation of unstructured finite element reacting flow solutionsParallel Computing, 1997
- A set of level 3 basic linear algebra subprogramsACM Transactions on Mathematical Software, 1990
- Basic Linear Algebra Subprograms for Fortran UsageACM Transactions on Mathematical Software, 1979