Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference

Top Cited Papers

1 June 2018

conference paper
conference paper
Published by Institute of Electrical and Electronics Engineers (IEEE)

p. 2704-2713
https://doi.org/10.1109/cvpr.2018.00286

Abstract

The rising popularity of intelligent mobile devices and the daunting computational cost of deep learning-based models call for efficient and accurate on-device inference schemes. We propose a quantization scheme that allows inference to be carried out using integer-only arithmetic, which can be implemented more efficiently than floating point inference on commonly available integer-only hardware. We also co-design a training procedure to preserve end-to-end model accuracy post quantization. As a result, the proposed quantization scheme improves the tradeoff between accuracy and on-device latency. The improvements are significant even on MobileNets, a model family known for run-time efficiency, and are demonstrated in ImageNet classification and COCO detection on popular CPUs.

Keywords

This publication has 5 references indexed in Scilit:

Densely Connected Convolutional Networks
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2017
Deep Residual Learning for Image Recognition
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2016
Rethinking the Inception Architecture for Computer Vision
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2016
Going deeper with convolutions
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2015
ImageNet: A large-scale hierarchical image database
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2009

Cited by 1531 articles