FLAME

1 December 2001

journal article
Published by Association for Computing Machinery (ACM) in ACM Transactions on Mathematical Software

Vol. 27 (4), 422-455
https://doi.org/10.1145/504210.504213

Abstract

Since the advent of high-performance distributed-memory parallel computing, the need for intelligible code has become ever greater. The development and maintenance of libraries for these architectures is simply too complex to be amenable to conventional approaches to implementation. Attempts to employ traditional methodology have led, in our opinion, to the production of an abundance of anfractuous code that is difficult to maintain and almost impossible to upgrade.Having struggled with these issues for more than a decade, we have concluded that a solution is to apply a technique from theoretical computer science, formal derivation, to the development of high-performance linear algebra libraries. We think the resulting approach results in aesthetically pleasing, coherent code that greatly facilitates intelligent modularity and high performance while enhancing confidence in its correctness. Since the technique is language-independent, it lends itself equally well to a wide spectrum of programming languages (and paradigms) ranging from C and Fortran to C++ and Java. In this paper, we illustrate our observations by looking at the Formal Linear Algebra Methods Environment (FLAME), a framework that facilitates the derivation and implementation of linear algebra algorithms on sequential architectures. This environment demonstrates that lessons learned in the distributed-memory world can guide us toward better approaches even in the sequential world.We present performance experiments on the Intel (R) Pentium (R) III processor that demonstrate that high performance can be attained by coding at a high level of abstraction.

Keywords

This publication has 14 references indexed in Scilit:

A flexible class of parallel matrix multiplication algorithms
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2002
Minimal-storage high-performance Cholesky factorization via blocking and recursion
IBM Journal of Research and Development, 2000
Applying recursion to serial and parallel QR factorization leads to better performance
IBM Journal of Research and Development, 2000
GEMM-based level 3 BLAS
ACM Transactions on Mathematical Software, 1998
Recursion leads to automatic variable blocking for dense linear-algebra algorithms
IBM Journal of Research and Development, 1997
A set of level 3 basic linear algebra subprograms
ACM Transactions on Mathematical Software, 1990
An extended set of FORTRAN basic linear algebra subprograms
ACM Transactions on Mathematical Software, 1988
Implementing Linear Algebra Algorithms for Dense Matrices on a Vector Pipeline Machine
Siam Review, 1984
Basic Linear Algebra Subprograms for Fortran Usage
ACM Transactions on Mathematical Software, 1979
Matrix Eigensystem Routines — EISPACK Guide
Lecture Notes in Computer Science, 1976

Cited by 149 articles