moTuner

Abstract

Arithmetic operators are now used in a wide spectrum of domains, including artificial intelligence, data analytics and scientific computing. Meanwhile, specialized hardware components to enable low-precision computing are increasingly deployed in GPUs and accelerators. Whereas promising to boost performance, accelerating the operators on the hardware necessitates manually tuning the mixed-precision knobs to balance the performance and accuracy, which can be extremely challenging in real practices. To address the issue, we present moTuner, an automatic framework for efficiently tuning mixed-precision operators. moTuner works on compiler-level to automatically enable the mixed-precision computation, without involving any manual modifications of source code and/or the operator library, thus significantly alleviating the programming burden. Owing to be implemented in compilation phase, moTuner can be more widely applicable with lessened efforts on the libraries. Further, moTuner adopts optimized search strategy in tuning to effectively narrow down the configuration space. The evaluations on GEMM operators and real applications demonstrate that moTuner achieves performance improvement up to 3.13x and 1.15x respectively, while guaranteeing considerably high accuracy.

Keywords

Funding Information

National Natural Science Foundation of China (62102465, U1811461)
The Program for Guangdong Introducing Innovative and Entrepreneurial Teams (2016ZT06D211)
the Guangdong Natural Science Foundation (2018B030312002)
the Major Program of Guangdong Basic and Applied Research (2019B030302002)
CCF-Baidu Open Fund (CCF-BAIDU OF2021032)

This publication has 19 references indexed in Scilit:

ImageNet classification with deep convolutional neural networks
Communications of the ACM, 2017
Towards a Compiler for Reals
ACM Transactions on Programming Languages and Systems, 2017
Rigorous floating-point mixed-precision tuning
Published by Association for Computing Machinery (ACM) ,2017
Sound compilation of reals
Published by Association for Computing Machinery (ACM) ,2014
Performance Analysis and Optimization of the Tiled Cholesky Factorization on NUMA Machines
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2012
Dynamic floating-point cancellation detection
Parallel Computing, 2012
A dynamic program analysis to find floating-point accuracy problems
Published by Association for Computing Machinery (ACM) ,2012
High-Order Curvilinear Finite Element Methods for Lagrangian Hydrodynamics
SIAM Journal on Scientific Computing, 2012
Computational fluid dynamics (CFD) – an effective and efficient design and analysis tool for the food industry: A review
Trends in Food Science & Technology, 2006
GMRES: A Generalized Minimal Residual Algorithm for Solving Nonsymmetric Linear Systems
SIAM Journal on Scientific and Statistical Computing, 1986