On the Anatomy of Predictive Models for Accelerating GPU Convolution Kernels and Beyond

Open Access

7 January 2021

journal article
research article
Published by Association for Computing Machinery (ACM) in ACM Transactions on Architecture and Code Optimization

Vol. 18 (1), 1-24
https://doi.org/10.1145/3434402

Abstract

Efficient HPC libraries often expose multiple tunable parameters, algorithmic implementations, or a combination of them, to provide optimized routines. The optimal parameters and algorithmic choices may depend on input properties such as the shapes of the matrices involved in the operation. Traditionally, these parameters are manually tuned or set by auto-tuners. In emerging applications such as deep learning, this approach is not effective across the wide range of inputs and architectures used in practice. In this work, we analyze different machine learning techniques and predictive models to accelerate the convolution operator and GEMM. Moreover, we address the problem of dataset generation, and we study the performance, accuracy, and generalization ability of the models. Our insights allow us to improve the performance of computationally expensive deep learning primitives on high-end GPUs as well as low-power embedded GPU architectures on three different libraries. Experimental results show significant improvement in the target applications from 50% up to 300% compared to auto-tuned and high-optimized vendor-based heuristics by using simple decision tree- and MLP-based models.

Keywords

Funding Information

UNIBZ RTD call 2018 (IN2087)

This publication has 49 references indexed in Scilit:

Autotuning algorithmic choice for input sensitivity
ACM SIGPLAN Notices, 2015
OpenTuner
Published by Association for Computing Machinery (ACM) ,2014
Optimizing convolution operations on GPUs using adaptive tiling
Future Generation Computer Systems, 2014
Hardware-oblivious parallelism for in-memory column-stores
Proceedings of the VLDB Endowment, 2013
Input-aware auto-tuning for directive-based GPU programming
Published by Association for Computing Machinery (ACM) ,2013
OpenACC — First Experiences with Real-World Applications
Lecture Notes in Computer Science, 2012
Anatomy of high-performance matrix multiplication
ACM Transactions on Mathematical Software, 2008
Support-vector networks
Machine Learning, 1995
Multilayer perceptron, fuzzy sets, and classification
IEEE Transactions on Neural Networks, 1992
A survey of decision tree classifier methodology
IEEE Transactions on Systems, Man, and Cybernetics, 1991

Cited by 7 articles