A Case for Fine-grain Coherence Specialization in Heterogeneous Systems

Open Access

22 August 2022

journal article
research article
Published by Association for Computing Machinery (ACM) in ACM Transactions on Architecture and Code Optimization

Vol. 19 (3), 1-26
https://doi.org/10.1145/3530819

Abstract

Hardware specialization is becoming a key enabler of energy-efficient performance. Future systems will be increasingly heterogeneous, integrating multiple specialized and programmable accelerators, each with different memory demands. Traditionally, communication between accelerators has been inefficient, typically orchestrated through explicit DMA transfers between different address spaces. More recently, industry has proposed unified coherent memory which enables implicit data movement and more data reuse, but often these interfaces limit the coherence flexibility available to heterogeneous systems. This paper demonstrates the benefits of fine-grained coherence specialization for heterogeneous systems. We propose an architecture that enables low-complexity independent specialization of each individual coherence request in heterogeneous workloads by building upon a simple and flexible baseline coherence interface, Spandex. We then describe how to optimize individual memory requests to improve cache reuse and performance-critical memory latency in emerging heterogeneous workloads. Collectively, our techniques enable significant gains, reducing execution time by up to 61% or network traffic by up to 99% while adding minimal complexity to the Spandex protocol.

Keywords

Funding Information

National Science Foundation (CCF 16-19245)
DARPA
Domain-Specific System on Chip (DSSoC) program, a Google Faculty Research
Applications Driving Architectures (ADA) Research Center

This publication has 58 references indexed in Scilit:

Remote Store Programming
Lecture Notes in Computer Science, 2010
Spatio-temporal memory streaming
ACM SIGARCH Computer Architecture News, 2009
Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset
ACM SIGARCH Computer Architecture News, 2005
Generating cache hints for improved program efficiency
Journal of Systems Architecture, 2005
Simics: A full system simulation platform
Computer, 2002
Data forwarding in scalable shared-memory multiprocessors
IEEE Transactions on Parallel and Distributed Systems, 1996
The Mur ϕ verification system
Lecture Notes in Computer Science, 1996
An adaptive update-based cache coherence protocol for reduction of miss rate and traffic
Lecture Notes in Computer Science, 1994
An adaptive cache coherence protocol optimized for migratory sharing
ACM SIGARCH Computer Architecture News, 1993
Multiple tuple spaces in Linda
Lecture Notes in Computer Science, 1989