uiCA
- 28 June 2022
- conference paper
- conference paper
- Published by Association for Computing Machinery (ACM) in Proceedings of the 36th ACM International Conference on Supercomputing
Abstract
Performance models that statically predict the steady-state throughput of basic blocks on particular microarchitectures, such as IACA, Ithemal, llvm-mca, OSACA, or CQA, can guide optimizing compilers and aid manual software optimization. However, their utility heavily depends on the accuracy of their predictions. The average error of existing models compared to measurements on the actual hardware has been shown to lie between 9% and 36%. But how good is this? To answer this question, we propose an extremely simple analytical throughput model that may serve as a baseline. Surprisingly, this model is already competitive with the state of the art, indicating that there is significant potential for improvement. To explore this potential, we develop a simulation-based throughput predictor. To this end, we propose a detailed parametric pipeline model that supports all Intel Core microarchitecture generations released between 2011 and 2021. We evaluate our predictor on an improved version of the BHive benchmark suite and show that its predictions are usually within 1% of measurement results, improving upon prior models by roughly an order of magnitude. The experimental evaluation also demonstrates that several microarchitectural details considered to be rather insignificant in previous work, are in fact essential for accurate prediction. Our throughput predictor is available as open source.Keywords
Funding Information
- European Research Council (101020415)
This publication has 27 references indexed in Scilit:
- Quantifying Performance Bottlenecks of Stencil Computations Using the Execution-Cache-Memory ModelPublished by Association for Computing Machinery (ACM) ,2015
- CQA: A code quality analyzer tool at binary levelPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2014
- ZSimPublished by Association for Computing Machinery (ACM) ,2013
- Constraint-Based Register Allocation and Instruction SchedulingLecture Notes in Computer Science, 2012
- MARSSPublished by Association for Computing Machinery (ACM) ,2011
- The gem5 simulatorACM SIGARCH Computer Architecture News, 2011
- RooflineCommunications of the ACM, 2009
- PTLsim: A Cycle Accurate Full System x86-64 Microarchitectural SimulatorPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2007
- SimpleScalar: an infrastructure for computer system modelingComputer, 2002
- A New Measure of Rank CorrelationBiometrika, 1938