Access Pattern-Aware Cache Management for Improving Data Utilization in GPU

24 June 2017

journal article
conference paper
Published by Association for Computing Machinery (ACM) in ACM SIGARCH Computer Architecture News

Vol. 45 (2), 307-319
https://doi.org/10.1145/3140659.3080239

Abstract

Long latency of memory operation is a prominent performance bottleneck in graphics processing units (GPUs). The small data cache that must be shared across dozens of warps (a collection of threads) creates significant cache contention and premature data eviction. Prior works have recognized this problem and proposed warp throttling which reduces the number of active warps contending for cache space. In this paper we discover that individual load instructions in a warp exhibit four different types of data locality behavior: (1) data brought by a warp load instruction is used only once, which is classified as streaming data (2) data brought by a warp load is reused multiple times within the same warp, called intra-warp locality (3) data brought by a warp is reused multiple times but across different warps, called inter-warp locality (4) and some data exhibit both a mix of intra- and inter-warp locality. Furthermore, each load instruction exhibits consistently the same locality type across all warps within a GPU kernel. Based on this discovery we argue that cache management must be done using per-load locality type information, rather than applying warp-wide cache management policies. We propose Access Pattern-aware Cache Management (APCM), which dynamically detects the locality type of each load instruction by monitoring the accesses from one exemplary warp. APCM then uses the detected locality type to selectively apply cache bypassing and cache pinning of data based on load locality characterization. Using an extensive set of simulations we show that APCM improves performance of GPUs by 34% for cache sensitive applications while saving 27% of energy consumption over baseline GPU.

Keywords

Funding Information

National Science Foundation (CAREER-0954211)
National Research Foundation of Korea (2015R1A2A2A01008281)
Defense Advanced Research Projects Agency (PERFECT-HR0011-12-2- 0020)

This publication has 27 references indexed in Scilit:

Locality-Driven Dynamic GPU Cache Bypassing
Published by Association for Computing Machinery (ACM) ,2015
Divergence-aware warp scheduling
Published by Association for Computing Machinery (ACM) ,2013
A locality-aware memory hierarchy for energy-efficient GPU architectures
Published by Association for Computing Machinery (ACM) ,2013
BenchFriend
The International Journal of High Performance Computing Applications, 2013
GPUWattch
Published by Association for Computing Machinery (ACM) ,2013
Improving Cache Management Policies Using Dynamic Reuse Distances
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2012
Characterizing and improving the use of demand-fetched caches in GPUs
Published by Association for Computing Machinery (ACM) ,2012
Reducing off-chip memory traffic by selective cache management scheme in GPGPUs
Published by Association for Computing Machinery (ACM) ,2012
A closer look at GPUs
Communications of the ACM, 2008
CACTI: an enhanced cache access and cycle time model
IEEE Journal of Solid-State Circuits, 1996

Cited by 9 articles