Mechanisms for store-wait-free multiprocessors
- 9 June 2007
- journal article
- Published by Association for Computing Machinery (ACM) in ACM SIGARCH Computer Architecture News
- Vol. 35 (2), 266-277
- https://doi.org/10.1145/1273440.1250696
Abstract
Store misses cause significant delays in shared-memory multiprocessors because of limited store buffering and ordering constraints required for proper synchronization. Today, programmers must choose from a spectrum of memory consistency models that reduce store stalls at the cost of increased programming complexity. Prior research suggests that the performance gap among consistency models can be closed through speculation--enforcing order only when dynamically necessary. Unfortunately, past designs either provide insufficient buffering, replace all stores with read-modify-write operations, and/or recover from ordering violations via impractical fine-grained rollback mechanisms. We propose two mechanisms that, together, enable store-wait-free implementations of any memory consistency model. To eliminate buffer-capacity-related stalls, we propose the scalable store buffer, which places private/speculative values directly into the L1 cache, thereby eliminating the non-scalable associative search of conventional store buffers. To eliminate ordering-related stalls, we propose atomic sequence ordering, which enforces ordering constraints over coarse-grain access sequences while relaxing order among individual accesses. Using cycle-accurate full-system simulation of scientific and commercial applications, we demonstrate that these mechanisms allow the simplified programming of strict ordering while outperforming conventional implementations on average by 32% (sequential consistency), 22% (SPARC total store order) and 9% (SPARC relaxed memory order).Keywords
This publication has 26 references indexed in Scilit:
- BulkSCPublished by Association for Computing Machinery (ACM) ,2007
- CAVAACM Transactions on Architecture and Code Optimization, 2006
- Transactional MemorySynthesis Lectures on Computer Architecture, 2006
- Speculative synchronizationPublished by Association for Computing Machinery (ACM) ,2002
- A chip-multiprocessor architecture with speculative multithreadingInternational Conference on Acoustics, Speech, and Signal Processing (ICASSP), 1999
- Data speculation support for a chip multiprocessorPublished by Association for Computing Machinery (ACM) ,1998
- Multiprocessors should support simple memory consistency modelsComputer, 1998
- Using speculative retirement and larger instruction windows to narrow the performance gap between memory consistency modelsPublished by Association for Computing Machinery (ACM) ,1997
- Shared memory consistency models: a tutorialComputer, 1996
- Performance evaluation of memory consistency models for shared-memory multiprocessorsPublished by Association for Computing Machinery (ACM) ,1991