A Case for Fine-grain Coherence Specialization in Heterogeneous Systems
Open Access
- 22 August 2022
- journal article
- research article
- Published by Association for Computing Machinery (ACM) in ACM Transactions on Architecture and Code Optimization
- Vol. 19 (3), 1-26
- https://doi.org/10.1145/3530819
Abstract
Hardware specialization is becoming a key enabler of energy-efficient performance. Future systems will be increasingly heterogeneous, integrating multiple specialized and programmable accelerators, each with different memory demands. Traditionally, communication between accelerators has been inefficient, typically orchestrated through explicit DMA transfers between different address spaces. More recently, industry has proposed unified coherent memory which enables implicit data movement and more data reuse, but often these interfaces limit the coherence flexibility available to heterogeneous systems. This paper demonstrates the benefits of fine-grained coherence specialization for heterogeneous systems. We propose an architecture that enables low-complexity independent specialization of each individual coherence request in heterogeneous workloads by building upon a simple and flexible baseline coherence interface, Spandex. We then describe how to optimize individual memory requests to improve cache reuse and performance-critical memory latency in emerging heterogeneous workloads. Collectively, our techniques enable significant gains, reducing execution time by up to 61% or network traffic by up to 99% while adding minimal complexity to the Spandex protocol.Keywords
Funding Information
- National Science Foundation (CCF 16-19245)
- DARPA
- Domain-Specific System on Chip (DSSoC) program, a Google Faculty Research
- Applications Driving Architectures (ADA) Research Center
This publication has 58 references indexed in Scilit:
- Remote Store ProgrammingLecture Notes in Computer Science, 2010
- Spatio-temporal memory streamingACM SIGARCH Computer Architecture News, 2009
- Multifacet's general execution-driven multiprocessor simulator (GEMS) toolsetACM SIGARCH Computer Architecture News, 2005
- Generating cache hints for improved program efficiencyJournal of Systems Architecture, 2005
- Simics: A full system simulation platformComputer, 2002
- Data forwarding in scalable shared-memory multiprocessorsIEEE Transactions on Parallel and Distributed Systems, 1996
- The Mur ϕ verification systemLecture Notes in Computer Science, 1996
- An adaptive update-based cache coherence protocol for reduction of miss rate and trafficLecture Notes in Computer Science, 1994
- An adaptive cache coherence protocol optimized for migratory sharingACM SIGARCH Computer Architecture News, 1993
- Multiple tuple spaces in LindaLecture Notes in Computer Science, 1989