MRPB: Memory request prioritization for massively parallel processors

1 February 2014

conference paper
conference paper
Published by Institute of Electrical and Electronics Engineers (IEEE)

p. 272-283
https://doi.org/10.1109/hpca.2014.6835938

Abstract

Massively parallel, throughput-oriented systems such as graphics processing units (GPUs) offer high performance for a broad range of programs. They are, however, complex to program, especially because of their intricate memory hierarchies with multiple address spaces. In response, modern GPUs have widely adopted caches, hoping to providing smoother reductions in memory access traffic and latency. Unfortunately, GPU caches often have mixed or unpredictable performance impact due to cache contention that results from the high thread counts in GPUs. We propose the memory request prioritization buffer (MRPB) to ease GPU programming and improve GPU performance. This hardware structure improves caching efficiency of massively parallel workloads by applying two prioritization methods-request reordering and cache bypassing-to memory requests before they access a cache. MRPB then releases requests into the cache in a more cache-friendly order. The result is drastically reduced cache contention and improved use of the limited per-thread cache capacity. For a simulated 16KB L1 cache, MRPB improves the average performance of the entire PolyBench and Rodinia suites by 2.65× and 1.27× respectively, outperforming a state-of-the-art GPU cache management technique.

Keywords

This publication has 18 references indexed in Scilit:

Characterizing and improving the use of demand-fetched caches in GPUs
Published by Association for Computing Machinery (ACM) ,2012
DL: A data layout transformation system for heterogeneous computing
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2012
Improving GPU performance via large warps and two-level warp scheduling
Published by Association for Computing Machinery (ACM) ,2011
Dymaxion
Published by Association for Computing Machinery (ACM) ,2011
Many-Thread Aware Prefetching Mechanisms for GPGPU Applications
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2010
High performance cache replacement using re-reference interval prediction (RRIP)
Published by Association for Computing Machinery (ACM) ,2010
Avoiding cache thrashing due to private data placement in last-level cache for manycore scaling
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2009
Many-Core vs. Many-Thread Machines: Stay Away From the Valley
IEEE Computer Architecture Letters, 2009
A closer look at GPUs
Communications of the ACM, 2008
Merrimac
Published by Association for Computing Machinery (ACM) ,2003

Cited by 104 articles