Access order and effective bandwidth for streams on a Direct Rambus memory
- 1 January 1999
- conference paper
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
Abstract
Processor speeds are increasing rapidly and memory speeds are not keeping up. Streaming computations (such as multimedia or scientific applications) are among those whose performance is most limited by the memory bottleneck. Rambus hopes to bridge the processor/memory performance gap with a recently introduced DRAM that can deliver up to 1.6 Gbytes/sec. We analyze the performance of these interesting new memory devices on the inner loops of streaming computations, both for traditional memory controllers that treat all DRAM transactions as random cacheline accesses, and for controllers augmented with streaming hardware. For our benchmarks, we find that accessing unit-stride streams in cacheline bursts in the natural order of the computation exploits from 44-76% of the peak bandwidth of a memory system composed of a single Direct RDRAM device, and that accessing streams via a streaming mechanism with a simple access ordering scheme can improve performance by factors of 1.18 to 2.25.Keywords
This publication has 11 references indexed in Scilit:
- Increasing the Number of Strides for Conflict-free Vector AccessPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2005
- Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffersPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Increasing TLB reach using superpages backed by shadow memoryPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Command vector memory systems: high performance at low costPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Access order to avoid inter-vector-conflicts in complex memory systemsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Direct RAMbus technology: the new main memory standardIEEE Micro, 1997
- Memory bandwidth limitations of future microprocessorsACM SIGARCH Computer Architecture News, 1996
- Design and evaluation of dynamic access ordering hardwarePublished by Association for Computing Machinery (ACM) ,1996
- Code generation for streaming: an access/execute mechanismPublished by Association for Computing Machinery (ACM) ,1991
- A set of level 3 basic linear algebra subprogramsACM Transactions on Mathematical Software, 1990