Reduced-Complexity Deep Neural Networks Design Using Multi-Level Compression

Abstract
Deep Neural Network has achieved great success in many fields. However, many DNN models are both deep and large thereby causing high storage and energy consumption during the training and inference phases. This paper proposes multi-level compression framework. By utilizing cross-layer parameter-reducing techniques ranging from structure compression to weight compression to representation compression, the proposed compression strategy can enable order-of-magnitude reduction in network size for both training and inference with negligible accuracy loss, thereby leading to very high-efficiency and high-accuracy DNN models. Experiments show that the proposed strategy can achieve around 1.8K compression ratio in terms of dense matrices and around 30x for the overall model.

This publication has 8 references indexed in Scilit: