Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference
Top Cited Papers
- 1 June 2018
- conference paper
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
- p. 2704-2713
- https://doi.org/10.1109/cvpr.2018.00286
Abstract
The rising popularity of intelligent mobile devices and the daunting computational cost of deep learning-based models call for efficient and accurate on-device inference schemes. We propose a quantization scheme that allows inference to be carried out using integer-only arithmetic, which can be implemented more efficiently than floating point inference on commonly available integer-only hardware. We also co-design a training procedure to preserve end-to-end model accuracy post quantization. As a result, the proposed quantization scheme improves the tradeoff between accuracy and on-device latency. The improvements are significant even on MobileNets, a model family known for run-time efficiency, and are demonstrated in ImageNet classification and COCO detection on popular CPUs.Keywords
This publication has 5 references indexed in Scilit:
- Densely Connected Convolutional NetworksPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2017
- Deep Residual Learning for Image RecognitionPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2016
- Rethinking the Inception Architecture for Computer VisionPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2016
- Going deeper with convolutionsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2015
- ImageNet: A large-scale hierarchical image databasePublished by Institute of Electrical and Electronics Engineers (IEEE) ,2009