REMOC

Abstract

The on-chip memories of GPUs, including the register file, shared memory and L1 cache, can provide high bandwidth and low latency access for the temporary storage of data. The capacity of L1 cache can be increased by using the registers/shared memory that are unassigned to any warps/thread blocks or released after warps/thread blocks are finished as cache-lines. In this paper, we propose two techniques to manage requests for on-chip memories to improve the efficiency of L1 cache on the base of leveraging registers and shared memory as cache-lines. Specifically, we develop a data transferring policy which is triggered when cache-lines are recalled by the first register or shared memory accesses of warps that are newly launched to prevent the data locality from being destroyed. Additionally, we design a parallel issue scheme by exploring the parallel feature of requests of an instruction accessing the register file, shared memory and L1 cache to decrease the processing latency and hence increase the throughput of instructions. The experimental results demonstrate that our approach improves the performance by 15% over prior work.

Keywords

Funding Information

Fundamental Research Funds for the Central Universities of Civil Aviation University of China (3122021053)

This publication has 10 references indexed in Scilit:

Virtual-Cache: A cache-line borrowing technique for efficient GPU cache architectures
Microprocessors and Microsystems, 2021
Analyzing and Leveraging Decoupled L1 Caches in GPUs
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2021
BOW: Breathing Operand Windows to Exploit Bypassing in GPUs
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2020
Linebacker
Published by Association for Computing Machinery (ACM) ,2019
An Efficient GPU Cache Architecture for Applications with Irregular Memory Access Patterns
ACM Transactions on Architecture and Code Optimization, 2019
LTRF
Published by Association for Computing Machinery (ACM) ,2018
MRPB: Memory request prioritization for massively parallel processors
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2014
GPUWattch
Published by Association for Computing Machinery (ACM) ,2013
A quantitative study of irregular programs on GPUs
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2012
Auto-tuning a high-level language targeted to GPU codes
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2012

Cited by 4 articles