DyFiP
- 5 April 2022
- conference paper
- conference paper
- Published by Association for Computing Machinery (ACM) in Proceedings of the 2nd European Workshop on Machine Learning and Systems
Abstract
Filter pruning is one of the most effective ways to accelerate Convolutional Neural Networks (CNNs). Most of the existing works are focused on the static pruning of CNN filters. In dynamic pruning of CNN filters, existing works are based on the idea of switching between different branches of a CNN or exiting early based on the hardness of a sample. These approaches can reduce the average latency of inference, but they cannot reduce the longest-path latency of inference. In contrast, we present a novel approach of dynamic filter pruning that utilizes explainable AI along with early coarse prediction in the intermediate layers of a CNN. This coarse prediction is performed using a simple branch that is trained to perform top-k classification. The branch either predicts the output class with high confidence, in which case the rest of the computations are left out. Alternatively, the branch predicts the output class to be within a subset of possible output classes. After this coarse prediction, only those filters that are important for this subset of classes are then evaluated. The importances of filters for each output class are obtained using explainable AI. Using this concept of dynamic pruning, we are able not only to reduce the average latency of inference, but also the longest-path latency of inference. Our proposed architecture for dynamic pruning can be deployed on different hardware platforms.Keywords
This publication has 6 references indexed in Scilit:
- It's always personalPublished by Association for Computing Machinery (ACM) ,2021
- Dynamic Pruning of CNN networksPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2019
- Shallowing Deep Networks: Layer-Wise Pruning Based on Feature RepresentationsIEEE Transactions on Pattern Analysis and Machine Intelligence, 2018
- BranchyNet: Fast inference via early exiting from deep neural networksPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2016
- The NumPy Array: A Structure for Efficient Numerical ComputationComputing in Science & Engineering, 2011
- Matplotlib: A 2D Graphics EnvironmentComputing in Science & Engineering, 2007