
A novel quantization method combined with knowledge distillation for deep neural networks
Author(s) -
Zhou Hu,
Chunxiao Fan,
Guangming Song
Publication year - 2021
Publication title -
journal of physics. conference series
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.21
H-Index - 85
eISSN - 1742-6596
pISSN - 1742-6588
DOI - 10.1088/1742-6596/1976/1/012026
Subject(s) - quantization (signal processing) , artificial neural network , learning vector quantization , computer science , computation , distillation , inference , algorithm , deep learning , resampling , artificial intelligence , computer engineering , chemistry , organic chemistry
The massive parameters and intensive computations in neural networks always limit the deployment on embedded devices with poor storage and computing power. To solve this problem, a novel quantization algorithm combined with Knowledge Distillation (KD) is proposed to reduce the model size and speed up the inference of deep models. The proposed method consists of two phases, KD-Training and Quantization-Retraining. KD-Training attempts to train a compact student model with pre-quantized weights by the proposed pre-quantized constraint loss. In Quantization-Retraining, the pre-quantized weights are quantized to 2 n and the first and last layers of the network are retained to make up for the accuracy loss caused by quantization. Experiments on the CIFAR-10 dataset show that the proposed method can obtain a low-precision (2-5bit) quantized student model with a compact structure, and the test accuracy even exceeds its full-precision(32bit) reference as the improvement of the generalization ability. It can get higher performance compared with that obtained by other quantization methods. Also, since the quantized weights are constrained in { ± 2 n }, it is suitable for the acceleration of the network calculation in hardware.