Scalpel
- 24 June 2017
- journal article
- conference paper
- Published by Association for Computing Machinery (ACM) in ACM SIGARCH Computer Architecture News
- Vol. 45 (2), 548-560
- https://doi.org/10.1145/3140659.3080215
Abstract
As the size of Deep Neural Networks (DNNs) continues to grow to increase accuracy and solve more complex problems, their energy footprint also scales. Weight pruning reduces DNN model size and the computation by removing redundant weights. However, we implemented weight pruning for several popular networks on a variety of hardware platforms and observed surprising results. For many networks, the network sparsity caused by weight pruning will actually hurt the overall performance despite large reductions in the model size and required multiply-accumulate operations. Also, encoding the sparse format of pruned networks incurs additional storage space overhead. To overcome these challenges, we propose Scalpel that customizes DNN pruning to the underlying hardware by matching the pruned network structure to the data-parallel hardware organization. Scalpel consists of two techniques: SIMD-aware weight pruning and node pruning. For low-parallelism hardware (e.g., microcontroller), SIMD-aware weight pruning maintains weights in aligned fixed-size groups to fully utilize the SIMD units. For high-parallelism hardware (e.g., GPU), node pruning removes redundant nodes, not redundant weights, thereby reducing computation without sacrificing the dense matrix format. For hardware with moderate parallelism (e.g., desktop CPU), SIMD-aware weight pruning and node pruning are synergistically applied together. Across the microcontroller, CPU and GPU, Scalpel achieves mean speedups of 3.54x, 2.61x, and 1.25x while reducing the model sizes by 88%, 82%, and 53%. In comparison, traditional weight pruning achieves mean speedups of 1.90x, 1.06x, 0.41x across the three platforms.Keywords
This publication has 21 references indexed in Scilit:
- PRIME: A Novel Processing-in-Memory Architecture for Neural Network Computation in ReRAM-Based Main MemoryPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2016
- EyerissACM SIGARCH Computer Architecture News, 2016
- Fast R-CNNPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2015
- ShiDianNaoPublished by Association for Computing Machinery (ACM) ,2015
- Going deeper with convolutionsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2015
- DaDianNao: A Machine-Learning SupercomputerPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2014
- CaffePublished by Association for Computing Machinery (ACM) ,2014
- Multiframe deep neural networks for acoustic modelingPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2013
- Optimal Brain Surgeon and general network pruningPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Gradient-based learning applied to document recognitionProceedings of the IEEE, 1998