Practical Structure Layout Optimization and Advice

Abstract
With the delta between processor clock frequency and memory latency ever increasing and with the standard locality improving transformations maturing, compilers increasingly seek to modify an application's data layout to improve spatial and temporal locality and to reduce cache miss and page fault penalties. In this paper we describe a practical implementation of the data layout optimizations Structure Splitting, Structure Peeling, Structure Field Reordering and Dead Field Removal, both for profile and non-profile based compilations. We demonstrate significant performance gains, but find that automatic transformations fail for a relatively high number of record types because of legality violations or profitability constraints. Additionally, we find a class of desirable transformations for which the framework cannot provide satisfying results. To address this issue we complement the automatic transformations with an advisory tool. We reuse the compiler analysis done for automatic transformation and correlate its results with peformance data collected during runtime for structure fields, such as data cache misses and latencies. We then use the compiler as a pefomtance analysis and reporting tool and provide insight into how to layout structure types more eficiently.

This publication has 17 references indexed in Scilit: