Cross-Loop Optimization of Arithmetic Intensity for Finite Element Local Assembly

Open Access

9 January 2015

journal article
research article
Published by Association for Computing Machinery (ACM) in ACM Transactions on Architecture and Code Optimization

Vol. 11 (4), 1-25
https://doi.org/10.1145/2687415

Abstract

We study and systematically evaluate a class of composable code transformations that improve arithmetic intensity in local assembly operations, which represent a significant fraction of the execution time in finite element methods. Their performance optimization is indeed a challenging issue. Even though affine loop nests are generally present, the short trip counts and the complexity of mathematical expressions, which vary among different problems, make it hard to determine an optimal sequence of successful transformations. Our investigation has resulted in the implementation of a compiler (called COFFEE) for local assembly kernels, fully integrated with a framework for developing finite element methods. The compiler manipulates abstract syntax trees generated from a domain-specific language by introducing domain-aware optimizations for instruction-level parallelism and register locality. Eventually, it produces C code including vector SIMD intrinsics. Experiments using a range of real-world finite element problems of increasing complexity show that significant performance improvement is achieved. The generality of the approach and the applicability of the proposed code transformations to other domains is also discussed.

Keywords

Other Versions

Version 2, 2014-07-03, preprints

Funding Information

U.S. National Science Foundation (0811457, 0926687, and 1059417)
MAPDES project
Department of Computing at Imperial College London
EPSRC (EP/I00677X/1, EP/I006761/1, and EP/L000407/1)
Louisiana State University
HiPEAC collaboration grant
NERC (NE/K008951/1 and NE/K006789/1)
U.S. Army through contract W911NF-10-1-000

This publication has 24 references indexed in Scilit:

Optimized code generation for finite element local assembly using symbolic manipulation
ACM Transactions on Mathematical Software, 2013
Finite Element Integration on GPUs
ACM Transactions on Mathematical Software, 2013
Compiler Optimizations for Industrial Unstructured Mesh CFD Applications on GPUs
Lecture Notes in Computer Science, 2013
Performance-Portable Finite Element Assembly Using PyOP2 and FEniCS
Lecture Notes in Computer Science, 2013
From h to p efficiently: Implementing finite and spectral/hp element methods to achieve optimal performance for low- and high-order discretisations
Journal of Computational Physics, 2010
Towards generating optimised finite element solvers for GPUs from high-level specifications
Procedia Computer Science, 2010
Optimizations for quadrature representations of finite element tensors through automated code generation
ACM Transactions on Mathematical Software, 2010
Performance Optimization of Tensor Contraction Expressions for Many-Body Methods in Quantum Chemistry
The Journal of Physical Chemistry A, 2009
A compiler for variational forms
ACM Transactions on Mathematical Software, 2006
Optimizing the Evaluation of Finite Element Matrices
SIAM Journal on Scientific Computing, 2005

Cited by 38 articles