A Comparative Survey of the HPC and Big Data Paradigms: Analysis and Experiments
- 1 September 2016
- conference paper
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE) in 2016 IEEE International Conference on Cluster Computing (CLUSTER)
- p. 423-432
- https://doi.org/10.1109/cluster.2016.21
Abstract
Many scientific data analytic applications need huge amounts of input, which can often consist of more than several TBs of data. This emphasizes the high I/O and processing/computational cost requirements of these algorithms. Tasks in these programs can induce more I/O operations than computations or the opposite. Hardware also includes nodes with large storage devices and/or nodes with sophisticated computational capabilities. To embrace the heterogeneity of the hardware systems in non-cloud and cloud environments, the issues of resource and job allocation in these environments need to be revisited. High-Performance Computing models, under the leadership of MPI (plus OpenMP) parallel APIs, have mostly met users' requirements in terms of high computational performance, while Big Data frameworks such as Spark have performed likewise in terms of high-level programming, resiliency and I/O handling. Therefore, in order to meet the specialized needs of scientists, there is a need for convergence between HPC and Big Data ecosystems. This paper presents a data-supported, comparative survey of the main current HPC and Big Data programming interfaces, namely MPI, OpenMP, PGAS (OpenSHMEM), Spark, and Hadoop, and their software stacks. A comprehensive experimental study of these interfaces on a set of benchmarks, namely reduction and I/O microbenchmarks, the StackExchange AnswersCount benchmark, and PageRank Benchmark has been performed on a single platform in order to achieve a fair comparison. These experiments lead to a thorough discussion about whether the envisioned convergence is needed or not, efficient or not, and whether it is the best solution to tackle future computational challenges.Keywords
This publication has 19 references indexed in Scilit:
- RegentPublished by Association for Computing Machinery (ACM) ,2015
- Accelerating Kirchhoff Migration on GPU Using DirectivesPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2014
- XSEDE: Accelerating Scientific DiscoveryComputing in Science & Engineering, 2014
- OpenMP for AcceleratorsLecture Notes in Computer Science, 2011
- Introducing OpenSHMEMPublished by Association for Computing Machinery (ACM) ,2010
- The Scalasca performance toolset architectureConcurrency and Computation: Practice and Experience, 2010
- Towards Efficient MapReduce Using MPILecture Notes in Computer Science, 2009
- MapReduceCommunications of the ACM, 2008
- The Tau Parallel Performance SystemThe International Journal of High Performance Computing Applications, 2006
- Co-array Fortran for parallel programmingACM SIGPLAN Fortran Forum, 1998