Automatic Sublining for Efficient Sparse Memory Accesses

10 May 2021

journal article
research article
Published by Association for Computing Machinery (ACM) in ACM Transactions on Architecture and Code Optimization

Vol. 18 (3), 1-23
https://doi.org/10.1145/3452141

Abstract

Sparse memory accesses, which are scattered accesses to single elements of a large data structure, are a challenge for current processor architectures. Their lack of spatial and temporal locality and their irregularity makes caches and traditional stream prefetchers useless. Furthermore, performing standard caching and prefetching on sparse accesses wastes precious memory bandwidth and thrashes caches, deteriorating performance for regular accesses. Bypassing prefetchers and caches for sparse accesses, and fetching only a single element (e.g., 8 B) from main memory (subline access), can solve these issues. Deciding which accesses to handle as sparse accesses and which as regular cached accesses, is a challenging task, with a large potential impact on performance. Not only is performance reduced by treating sparse accesses as regular accesses, not caching accesses that do have locality also negatively impacts performance by significantly increasing their latency and bandwidth consumption. Furthermore, this decision depends on the dynamic environment, such as input set characteristics and system load, making a static decision by the programmer or compiler suboptimal. We propose the Instruction Spatial Locality Estimator (ISLE), a hardware detector that finds instructions that access isolated words in a sea of unused data. These sparse accesses are dynamically converted into uncached subline accesses, while keeping regular accesses cached. ISLE does not require modifying source code or binaries, and adapts automatically to a changing environment (input data, available bandwidth, etc.). We apply ISLE to a graph analytics processor running sparse graph workloads, and show that ISLE outperforms the performance of no subline accesses, manual sublining, and prior work on detecting sparse accesses.

Keywords

This publication has 26 references indexed in Scilit:

Novel graph processor architecture, prototype system, and results
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2016
Energy Efficient Architecture for Graph Analytics Accelerators
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2016
CLEAN-ECC
Published by Association for Computing Machinery (ACM) ,2015
IMP
Published by Association for Computing Machinery (ACM) ,2015
Locality-Driven Dynamic GPU Cache Bypassing
Published by Association for Computing Machinery (ACM) ,2015
Amoeba-Cache: Adaptive Blocks for Eliminating Waste in the Memory Hierarchy
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2012
The dynamic granularity memory system
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2012
Sniper
Published by Association for Computing Machinery (ACM) ,2011
CHALLENGES IN PARALLEL GRAPH PROCESSING
Parallel Processing Letters, 2007
R-MAT: A Recursive Model for Graph Mining
Published by Society for Industrial & Applied Mathematics (SIAM) ,2004

Cited by 1 article