Pythia: A Customizable Hardware Prefetching Framework Using Online Reinforcement Learning

17 October 2021

conference paper
conference paper
Published by Association for Computing Machinery (ACM)

https://doi.org/10.1145/3466752.3480114

Abstract

Past research has proposed numerous hardware prefetching techniques, most of which rely on exploiting one specific type of program context information (e.g., program counter, cacheline address, or delta between cacheline addresses) to predict future memory accesses. These techniques either completely neglect a prefetcher’s undesirable effects (e.g., memory bandwidth usage) on the overall system, or incorporate system-level feedback as an afterthought to a system-unaware prefetch algorithm. We show that prior prefetchers often lose their performance benefit over a wide range of workloads and system configurations due to their inherent inability to take multiple different types of program context and system-level feedback information into account while prefetching. In this paper, we make a case for designing a holistic prefetch algorithm that learns to prefetch using multiple different types of program context and system-level feedback information inherent to its design. To this end, we propose Pythia, which formulates the prefetcher as a reinforcement learning agent. For every demand request, Pythia observes multiple different types of program context information to make a prefetch decision. For every prefetch decision, Pythia receives a numerical reward that evaluates prefetch quality under the current memory bandwidth usage. Pythia uses this reward to reinforce the correlation between program context information and prefetch decision to generate highly accurate, timely, and system-aware prefetch requests in the future. Our extensive evaluations using simulation and hardware synthesis show that Pythia outperforms two state-of-the-art prefetchers (MLOP and Bingo) by 3.4% and 3.8% in single-core, 7.7% and 9.6% in twelve-core, and 16.9% and 20.2% in bandwidth-constrained core configurations, while incurring only 1.03% area overhead over a desktop-class processor and no software changes in workloads. The source code of Pythia can be freely downloaded from https://github.com/CMU-SAFARI/Pythia.

Keywords

This publication has 50 references indexed in Scilit:

Semantic locality and context-based prefetching using reinforcement learning
Published by Association for Computing Machinery (ACM) ,2015
B-Fetch: Branch Prediction Directed Prefetching for Chip-Multiprocessors
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2014
Linearizing irregular memory accesses for improved correlated prefetching
Published by Association for Computing Machinery (ACM) ,2013
Clearing the clouds
Published by Association for Computing Machinery (ACM) ,2012
Last-Touch Correlated Data Streaming
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2007
Helper threads via virtual multithreading on an experimental itanium ^® 2 processor-based platform
Published by Association for Computing Machinery (ACM) ,2004
Dynamic branch prediction with perceptrons
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2002
Dynamic hot data stream prefetching for general-purpose programs
Published by Association for Computing Machinery (ACM) ,2002
An effective on-chip preloading scheme to reduce data access penalty
Published by Association for Computing Machinery (ACM) ,1991
Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers
Published by Association for Computing Machinery (ACM) ,1990

Cited by 20 articles