Co-processing SPMD computation on CPUs and GPUs cluster

Abstract
Heterogeneous parallel systems with multi processors and accelerators are becoming ubiquitous due to better cost-performance and energy-efficiency. These heterogeneous processor architectures have different instruction sets and are optimized for either task-latency or throughput purposes. Challenges occur in regard to programmability and performance when running SPMD tasks on heterogeneous devices. In order to meet these challenges, we implemented a parallel runtime system that used to co-process SPMD computation on CPUs and GPUs clusters. Furthermore, we are proposing an analytic model to automatically schedule SPMD tasks on heterogeneous clusters. Our analytic model is derived from the roofline model, and therefore it can be applied to a wider range of SPMD applications and hardware devices. The experimental results of the C-means, GMM, and GEMV show good speedup in practical heterogeneous cluster environments.

This publication has 14 references indexed in Scilit: