Optimizing the LU Factorization for Energy Efficiency on a Many-Core Architecture

1 October 2014

book chapter
conference paper
Published by Springer Science and Business Media LLC in Lecture Notes in Computer Science

p. 237-251
https://doi.org/10.1007/978-3-319-09967-5_14

Abstract

Power consumption and energy efficiency have become a major bottleneck in the design of new systems for high performance computing. The path to exa-scale computing requires new strategies that decrease the energy consumption of modern many-core architectures without sacrificing scalability or performance. The development of these strategies demands the use of scalable models for energy consumption and the reorientation of optimization techniques to focus on energy efficiency, evaluating their trade-offs with respect to performance. In this paper, we investigate several optimization techniques to reduce the energy consumption on many-core architectures with a software-managed memory hierarchy. We study the impact of these techniques on the Static Energy and the Dynamic Energy of the LU factorization benchmark using a scalable energy consumption model. The main contributions of this paper are: (1) The modeling and analysis of energy consumption and energy efficiency for LU factorization; (2) the study and design of instruction-level and task-level optimizations for the reduction of the Static and Dynamic Energy; (3) the design and implementation of an energy aware tiling that decreases the Dynamic Energy of power hungry instructions in the LU factorization benchmark; and (4) the experimental evaluation of the scalability and improvement in terms of energy consumption and power efficiency of the proposed optimizations using the IBM Cyclops-64 many-core architecture. We study the trade-offs between performance and power efficiency for the proposed optimizations. Our results for the LU factorization benchmark, using 156 hardware thread units, show an improvement in power efficiency between 1.68X and 4.87X for different matrix sizes. In addition, we point out examples of optimizations that scale in performance but not necessarily in power efficiency.

Keywords

This publication has 16 references indexed in Scilit:

A dynamic schema to increase performance in many-core architectures through percolation operations
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2013
Dynamic percolation
Published by Association for Computing Machinery (ACM) ,2012
Hybrid Static/dynamic Scheduling for Already Optimized Dense Matrix Factorization
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2012
Locality Optimization of Stencil Applications Using Data Dependency Graphs
Lecture Notes in Computer Science, 2011
Optimized Dense Matrix Multiplication on a Many-Core Architecture
Lecture Notes in Computer Science, 2010
Scheduling for Reduced CPU Energy
Published by Springer Science and Business Media LLC ,2007
The LINPACK Benchmark: past, present and future
Concurrency and Computation: Practice and Experience, 2003
An Accurate Instruction-Level Energy Consumption Model for Embedded RISC Processors
ACM SIGPLAN Notices, 2001
Software Libraries for Linear Algebra Computations on High Performance Computers
Siam Review, 1995
The SPLASH-2 programs
ACM SIGARCH Computer Architecture News, 1995

Cited by 7 articles