Page Placement Strategies for GPUs within Heterogeneous Memory Systems
- 14 March 2015
- journal article
- research article
- Published by Association for Computing Machinery (ACM) in ACM SIGARCH Computer Architecture News
- Vol. 43 (1), 607-618
- https://doi.org/10.1145/2786763.2694381
Abstract
Systems from smartphones to supercomputers are increasingly heterogeneous, being composed of both CPUs and GPUs. To maximize cost and energy efficiency, these systems will increasingly use globally-addressable heterogeneous memory systems, making choices about memory page placement critical to performance. In this work we show that current page placement policies are not sufficient to maximize GPU performance in these heterogeneous memory systems. We propose two new page placement policies that improve GPU performance: one application agnostic and one using application profile information. Our application agnostic policy, bandwidth-aware (BW-AWARE) placement, maximizes GPU throughput by balancing page placement across the memories based on the aggregate memory bandwidth available in a system. Our simulation-based results show that BW-AWARE placement outperforms the existing Linux INTERLEAVE and LOCAL policies by 35% and 18% on average for GPU compute workloads. We build upon BW-AWARE placement by developing a compiler-based profiling mechanism that provides programmers with information about GPU application data structure access patterns. Combining this information with simple program-annotated hints about memory placement, our hint-based page placement approach performs within 90% of oracular page placement on average, largely mitigating the need for costly dynamic page tracking and migration.Keywords
Funding Information
- US Department of Energy
- NSF (CCF- 0845157)
This publication has 29 references indexed in Scilit:
- Optimizing GPU energy efficiency with 3D die-stacking graphics memory and reconfigurable memory interfaceACM Transactions on Architecture and Code Optimization, 2013
- Traffic managementPublished by Association for Computing Machinery (ACM) ,2013
- Software Design Space Exploration for Exascale Combustion Co-designLecture Notes in Computer Science, 2013
- Energy-efficient GPU design with reconfigurable in-package graphics memoryPublished by Association for Computing Machinery (ACM) ,2012
- Thread clusteringPublished by Association for Computing Machinery (ACM) ,2007
- Design and analysis of static memory management policies for CC-NUMA multiprocessorsJournal of Systems Architecture, 2002
- Dynamic page placement to improve locality in CC-NUMA multiprocessors for TPC-CPublished by Association for Computing Machinery (ACM) ,2001
- Operating system support for improving data locality on CC-NUMA compute serversPublished by Association for Computing Machinery (ACM) ,1996
- Evaluation of NUMA memory management through modeling and measurementsIEEE Transactions on Parallel and Distributed Systems, 1992
- Simple but effective techniques for NUMA memory managementPublished by Association for Computing Machinery (ACM) ,1989