A novel quantization method combined with knowledge distillation for deep neural networks | Zendy

Zhou Hu | Zendy; Chunxiao Fan | Zendy; Guangming Song | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

A novel quantization method combined with knowledge distillation for deep neural networks

Author(s) -

Zhou Hu,

Chunxiao Fan,

Guangming Song

Publication year - 2021

Publication title -

journal of physics. conference series

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.21

H-Index - 85

eISSN - 1742-6596

pISSN - 1742-6588

DOI - 10.1088/1742-6596/1976/1/012026

Subject(s) - quantization (signal processing) , artificial neural network , learning vector quantization , computer science , computation , distillation , inference , algorithm , deep learning , resampling , artificial intelligence , computer engineering , chemistry , organic chemistry

The massive parameters and intensive computations in neural networks always limit the deployment on embedded devices with poor storage and computing power. To solve this problem, a novel quantization algorithm combined with Knowledge Distillation (KD) is proposed to reduce the model size and speed up the inference of deep models. The proposed method consists of two phases, KD-Training and Quantization-Retraining. KD-Training attempts to train a compact student model with pre-quantized weights by the proposed pre-quantized constraint loss. In Quantization-Retraining, the pre-quantized weights are quantized to 2 n and the first and last layers of the network are retained to make up for the accuracy loss caused by quantization. Experiments on the CIFAR-10 dataset show that the proposed method can obtain a low-precision (2-5bit) quantized student model with a compact structure, and the test accuracy even exceeds its full-precision(32bit) reference as the improvement of the generalization ability. It can get higher performance compared with that obtained by other quantization methods. Also, since the quantized weights are constrained in { ± 2 n }, it is suitable for the acceleration of the network calculation in hardware.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Empowering knowledge with every search

About

About Careers Publisher Partners Contact Us

Learn

FAQs Blog Terms of Use Privacy Policy

About

Learn

Discover

Explore