Variation Among Processors Under Turbo Boost in HPC Systems
- 1 June 2016
- conference paper
- conference paper
- Published by Association for Computing Machinery (ACM) in Proceedings of the 2016 International Conference on Supercomputing
Abstract
The design and manufacture of present-day CPUs causes inherent variation in supercomputer architectures such as variation in power and temperature of the chips. The variation also manifests itself as frequency differences among processors under Turbo Boost dynamic overclocking. This variation can lead to unpredictable and suboptimal performance in tightly coupled HPC applications. In this study, we use compute-intensive kernels and applications to analyze the variation among processors in four top supercomputers: Edison, Cab, Stampede, and Blue Waters. We observe that there is an execution time difference of up to 16% among processors on the Turbo Boost-enabled supercomputers: Edison, Cab, Stampede. There is less than 1% variation on Blue Waters, which does not have a dynamic overclocking feature. We analyze measurements from temperature and power instrumentation and find that intrinsic differences in the chips' power efficiency is the culprit behind the frequency variation. Moreover, we analyze potential solutions such as disabling Turbo Boost, leaving idle cores and replacing slow chips to mitigate the variation. We also propose a speed-aware dynamic task redistribution (load balancing) algorithm to reduce the negative effects of performance variation. Our speed-aware load balancing algorithm improves the performance up to 18% compared to no load balancing performance and 6% better than the non-speed aware counterpart.Keywords
This publication has 20 references indexed in Scilit:
- Analyzing and mitigating the impact of manufacturing variability in power-constrained supercomputingPublished by Association for Computing Machinery (ACM) ,2015
- Minimizing Thermal Variation Across System ComponentsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2015
- Noise-Tolerant Explicit Stencil Computations for Nonuniform Process Execution RatesACM Transactions on Parallel Computing, 2015
- Energy-efficient computing for HPC workloads on heterogeneous manycore chipsPublished by Association for Computing Machinery (ACM) ,2015
- Maximizing Throughput of Overprovisioned HPC Data Centers Under a Strict Power BudgetPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2014
- A 'cool' way of improving the reliability of HPC machinesPublished by Association for Computing Machinery (ACM) ,2013
- RooflineCommunications of the ACM, 2009
- Just In Time Dynamic Voltage Scaling: Exploiting Inter-Node Slack to Save Energy in MPI ProgramsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2005
- Scalable molecular dynamics with NAMDJournal of Computational Chemistry, 2005
- A Portable Programming Interface for Performance Evaluation on Modern ProcessorsThe International Journal of High Performance Computing Applications, 2000