GEVO

25 November 2020

journal article
research article
Published by Association for Computing Machinery (ACM) in ACM Transactions on Architecture and Code Optimization

Vol. 17 (4), 1-28
https://doi.org/10.1145/3418055

Abstract

GPUs are a key enabler of the revolution in machine learning and high-performance computing, functioning as de facto co-processors to accelerate large-scale computation. As the programming stack and tool support have matured, GPUs have also become accessible to programmers, who may lack detailed knowledge of the underlying architecture and fail to fully leverage the GPU’s computation power. GEVO (Gpu optimization using EVOlutionary computation) is a tool for automatically discovering optimization opportunities and tuning the performance of GPU kernels in the LLVM representation. GEVO uses population-based search to find edits to GPU code compiled to LLVM-IR and improves performance on desired criteria while retaining required functionality. We demonstrate that GEVO improves the execution time of general-purpose GPU programs and machine learning (ML) models on NVIDIA Tesla P100. For the Rodinia benchmarks, GEVO improves GPU kernel runtime performance by an average of 49.48% and by as much as 412% over the fully compiler-optimized baseline. If kernel output accuracy is relaxed to tolerate up to 1% error, GEVO can find kernel variants that outperform the baseline by an average of 51.08%. For the ML workloads, GEVO achieves kernel performance improvement for SVM on the MNIST handwriting recognition (3.24×) and the a9a income prediction (2.93×) datasets with no loss of model accuracy. GEVO achieves 1.79× kernel performance improvement on image classification using ResNet18/CIFAR-10, with less than 1% model accuracy reduction.

Keywords

Funding Information

Defense Advanced Research Projects Agency (FA8750-19C-0003)
National Science Foundation (CCF-1618039, SHF-1652132, CCF-1908633)
Air Force Research Laboratory (FA8750-19-1-0501)

This publication has 69 references indexed in Scilit:

Software mutational robustness
Genetic Programming and Evolvable Machines, 2013
Synthesis of loop-free programs
ACM SIGPLAN Notices, 2011
LIBSVM
ACM Transactions on Intelligent Systems and Technology, 2011
A Hypercube-Based Encoding for Evolving Large-Scale Neural Networks
Artificial Life, 2009
General purpose molecular dynamics simulations fully implemented on graphics processing units
Journal of Computational Physics, 2008
Z3: An Efficient SMT Solver
Lecture Notes in Computer Science, 2008
Automatic generation of peephole superoptimizers
ACM SIGARCH Computer Architecture News, 2006
Evolving Neural Networks through Augmenting Topologies
Evolutionary Computation, 2002
Gradient-based learning applied to document recognition
Proceedings of the IEEE, 1998
Large-scale parallel data clustering
IEEE Transactions on Pattern Analysis and Machine Intelligence, 1998

Cited by 9 articles