Towards neural architecture-aware exploration of compiler optimizations in a deep learning {graph} compiler

Abstract

Deep Neural Networks (DNN) form the basis for many existing and emerging applications. Many DL compilers analyze the computation graphs and apply various optimizations at different stages. These high-level optimizations are applied using compiler passes before feeding the resultant computation graph for low-level and hardware-specific optimizations. With advancements in DNN architectures and backend hardware, the search space of compiler optimizations has grown manifolds. Also, the inclusion of passes without the knowledge of the computation graph leads to increased execution time with a slight influence on the intermediate representation. This paper presents preliminary results 1) summarizing the relevance of pass selection and ordering in a DL compiler, 2) neural architecture-aware selection of optimization passes, and 3) pruning search space for the phase selection problem in a DL compiler. We use TVM as a compiler to demonstrate the experimental results on Nvidia A100 and GeForce RTX 2080 GPUs, establishing the relevance of neural architecture-aware selection of optimization passes for DNNs DL compilers. Experimental evaluation with seven models categorized into four architecturally different classes demonstrated performance gains for most neural networks. For ResNets, the average throughput increased by 24% and 32% for TensorFlow and PyTorch frameworks, respectively. Additionally, we observed an average 15% decrease in the compilation time for ResNets, 45% for MobileNet, and 54% for SSD-based models without impacting the throughput. BERT models showed a dramatic improvement with a 92% reduction in the compile time.

Keywords

Funding Information

U.S. Office of the Under Secretary of Defense for Research and Engineering (OUSD(R&E)) (FA8750-15-2-0119)

This publication has 20 references indexed in Scilit:

SSD: Single Shot MultiBox Detector
Published by Springer Science and Business Media LLC ,2016
A graph-based iterative compiler pass selection and phase ordering approach
ACM SIGPLAN Notices, 2016
Clustering-Based Selection for the Exploration of Compiler Optimization Sequences
ACM Transactions on Architecture and Code Optimization, 2016
Predictive modeling methodology for compiler phase-ordering
Published by Association for Computing Machinery (ACM) ,2016
Use of Previously Acquired Positioning of Optimizations for Phase Ordering Exploration
Published by Association for Computing Machinery (ACM) ,2015
A clustering-based approach for exploring sequences of compiler optimizations
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2014
Automatic selection of compiler options using genetic techniques for embedded software design
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2013
Collective optimization
ACM Transactions on Architecture and Code Optimization, 2010
Practical exhaustive optimization phase order exploration and evaluation
ACM Transactions on Architecture and Code Optimization, 2009
Optimizing general purpose compiler optimization
Published by Association for Computing Machinery (ACM) ,2005

Cited by 3 articles