Exploring Efficient Hardware Support for Applications with Irregular Memory Patterns on Multinode Manycore Architectures
- 5 August 2014
- journal article
- research article
- Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Parallel and Distributed Systems
- Vol. 28 (6), 1635-1648
- https://doi.org/10.1109/tpds.2014.2345073
Abstract
With computing systems becoming ubiquitous, numerous data sets of extremely large size are becoming available for analysis. Often the data collected have complex, graph based structures, which makes them difficult to process with traditional tools. Moreover, the irregularities in the data sets, and in the analysis algorithms, hamper the scaling of performance in large distributed high-performance systems, optimized for locality exploitation and regular data structures. In this paper we present an approach to system design that enable efficient execution of applications with irregular memory patterns on a distributed, many-core architecture, based on off-the-shelf cores. We introduce a set of hardware and software components, which provide a distributed global address space, fine-grained synchronization and latency hiding of remote accesses with multithreading. An FPGA prototype has been implemented to explore the design with a set of typical irregular kernels. We finally present an analytical model that highlights the benefits of the approach and helps identifying the bottlenecks in the prototype. The experimental evaluation on graph based applications demonstrates the scalability of the architecture for different configurations of the whole system.Keywords
This publication has 28 references indexed in Scilit:
- Distributed GraphLabProceedings of the VLDB Endowment, 2012
- Exploring the network structure and nodal centrality of China’s air transport network: A complex network approachJournal of Transport Geography, 2011
- Modeling critical sections in Amdahl's law and its implications for multicore designACM SIGARCH Computer Architecture News, 2010
- Centrality measures and the importance of generalist species in pollination networksEcological Complexity, 2010
- Parallel Programmability and the Chapel LanguageThe International Journal of High Performance Computing Applications, 2007
- Characterization of topological keystone species: Local, global and “meso-scale” centralities in food websEcological Complexity, 2007
- Advances, Applications and Performance of the Global Arrays Shared Memory Programming ToolkitThe International Journal of High Performance Computing Applications, 2006
- Niagara: A 32-Way Multithreaded Sparc ProcessorIEEE Micro, 2005
- R-MAT: A Recursive Model for Graph MiningPublished by Society for Industrial & Applied Mathematics (SIAM) ,2004
- Performance tradeoffs in multithreaded processorsIEEE Transactions on Parallel and Distributed Systems, 1992