FLAME
- 1 December 2001
- journal article
- Published by Association for Computing Machinery (ACM) in ACM Transactions on Mathematical Software
- Vol. 27 (4), 422-455
- https://doi.org/10.1145/504210.504213
Abstract
Since the advent of high-performance distributed-memory parallel computing, the need for intelligible code has become ever greater. The development and maintenance of libraries for these architectures is simply too complex to be amenable to conventional approaches to implementation. Attempts to employ traditional methodology have led, in our opinion, to the production of an abundance of anfractuous code that is difficult to maintain and almost impossible to upgrade.Having struggled with these issues for more than a decade, we have concluded that a solution is to apply a technique from theoretical computer science, formal derivation, to the development of high-performance linear algebra libraries. We think the resulting approach results in aesthetically pleasing, coherent code that greatly facilitates intelligent modularity and high performance while enhancing confidence in its correctness. Since the technique is language-independent, it lends itself equally well to a wide spectrum of programming languages (and paradigms) ranging from C and Fortran to C++ and Java. In this paper, we illustrate our observations by looking at the Formal Linear Algebra Methods Environment (FLAME), a framework that facilitates the derivation and implementation of linear algebra algorithms on sequential architectures. This environment demonstrates that lessons learned in the distributed-memory world can guide us toward better approaches even in the sequential world.We present performance experiments on the Intel (R) Pentium (R) III processor that demonstrate that high performance can be attained by coding at a high level of abstraction.Keywords
This publication has 14 references indexed in Scilit:
- A flexible class of parallel matrix multiplication algorithmsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Minimal-storage high-performance Cholesky factorization via blocking and recursionIBM Journal of Research and Development, 2000
- Applying recursion to serial and parallel QR factorization leads to better performanceIBM Journal of Research and Development, 2000
- GEMM-based level 3 BLASACM Transactions on Mathematical Software, 1998
- Recursion leads to automatic variable blocking for dense linear-algebra algorithmsIBM Journal of Research and Development, 1997
- A set of level 3 basic linear algebra subprogramsACM Transactions on Mathematical Software, 1990
- An extended set of FORTRAN basic linear algebra subprogramsACM Transactions on Mathematical Software, 1988
- Implementing Linear Algebra Algorithms for Dense Matrices on a Vector Pipeline MachineSiam Review, 1984
- Basic Linear Algebra Subprograms for Fortran UsageACM Transactions on Mathematical Software, 1979
- Matrix Eigensystem Routines — EISPACK GuideLecture Notes in Computer Science, 1976