REMOC
- 17 May 2022
- conference paper
- conference paper
- Published by Association for Computing Machinery (ACM) in Proceedings of the 19th ACM International Conference on Computing Frontiers
Abstract
The on-chip memories of GPUs, including the register file, shared memory and L1 cache, can provide high bandwidth and low latency access for the temporary storage of data. The capacity of L1 cache can be increased by using the registers/shared memory that are unassigned to any warps/thread blocks or released after warps/thread blocks are finished as cache-lines. In this paper, we propose two techniques to manage requests for on-chip memories to improve the efficiency of L1 cache on the base of leveraging registers and shared memory as cache-lines. Specifically, we develop a data transferring policy which is triggered when cache-lines are recalled by the first register or shared memory accesses of warps that are newly launched to prevent the data locality from being destroyed. Additionally, we design a parallel issue scheme by exploring the parallel feature of requests of an instruction accessing the register file, shared memory and L1 cache to decrease the processing latency and hence increase the throughput of instructions. The experimental results demonstrate that our approach improves the performance by 15% over prior work.Keywords
Funding Information
- Fundamental Research Funds for the Central Universities of Civil Aviation University of China (3122021053)
This publication has 10 references indexed in Scilit:
- Virtual-Cache: A cache-line borrowing technique for efficient GPU cache architecturesMicroprocessors and Microsystems, 2021
- Analyzing and Leveraging Decoupled L1 Caches in GPUsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2021
- BOW: Breathing Operand Windows to Exploit Bypassing in GPUsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2020
- LinebackerPublished by Association for Computing Machinery (ACM) ,2019
- An Efficient GPU Cache Architecture for Applications with Irregular Memory Access PatternsACM Transactions on Architecture and Code Optimization, 2019
- LTRFPublished by Association for Computing Machinery (ACM) ,2018
- MRPB: Memory request prioritization for massively parallel processorsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2014
- GPUWattchPublished by Association for Computing Machinery (ACM) ,2013
- A quantitative study of irregular programs on GPUsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2012
- Auto-tuning a high-level language targeted to GPU codesPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2012