Cross-Loop Optimization of Arithmetic Intensity for Finite Element Local Assembly
Open Access
- 9 January 2015
- journal article
- research article
- Published by Association for Computing Machinery (ACM) in ACM Transactions on Architecture and Code Optimization
- Vol. 11 (4), 1-25
- https://doi.org/10.1145/2687415
Abstract
We study and systematically evaluate a class of composable code transformations that improve arithmetic intensity in local assembly operations, which represent a significant fraction of the execution time in finite element methods. Their performance optimization is indeed a challenging issue. Even though affine loop nests are generally present, the short trip counts and the complexity of mathematical expressions, which vary among different problems, make it hard to determine an optimal sequence of successful transformations. Our investigation has resulted in the implementation of a compiler (called COFFEE) for local assembly kernels, fully integrated with a framework for developing finite element methods. The compiler manipulates abstract syntax trees generated from a domain-specific language by introducing domain-aware optimizations for instruction-level parallelism and register locality. Eventually, it produces C code including vector SIMD intrinsics. Experiments using a range of real-world finite element problems of increasing complexity show that significant performance improvement is achieved. The generality of the approach and the applicability of the proposed code transformations to other domains is also discussed.Keywords
Other Versions
Funding Information
- U.S. National Science Foundation (0811457, 0926687, and 1059417)
- MAPDES project
- Department of Computing at Imperial College London
- EPSRC (EP/I00677X/1, EP/I006761/1, and EP/L000407/1)
- Louisiana State University
- HiPEAC collaboration grant
- NERC (NE/K008951/1 and NE/K006789/1)
- U.S. Army through contract W911NF-10-1-000
This publication has 24 references indexed in Scilit:
- Optimized code generation for finite element local assembly using symbolic manipulationACM Transactions on Mathematical Software, 2013
- Finite Element Integration on GPUsACM Transactions on Mathematical Software, 2013
- Compiler Optimizations for Industrial Unstructured Mesh CFD Applications on GPUsLecture Notes in Computer Science, 2013
- Performance-Portable Finite Element Assembly Using PyOP2 and FEniCSLecture Notes in Computer Science, 2013
- From h to p efficiently: Implementing finite and spectral/hp element methods to achieve optimal performance for low- and high-order discretisationsJournal of Computational Physics, 2010
- Towards generating optimised finite element solvers for GPUs from high-level specificationsProcedia Computer Science, 2010
- Optimizations for quadrature representations of finite element tensors through automated code generationACM Transactions on Mathematical Software, 2010
- Performance Optimization of Tensor Contraction Expressions for Many-Body Methods in Quantum ChemistryThe Journal of Physical Chemistry A, 2009
- A compiler for variational formsACM Transactions on Mathematical Software, 2006
- Optimizing the Evaluation of Finite Element MatricesSIAM Journal on Scientific Computing, 2005