The impact of integrating shallow and deep information on knowledge distillation | Zendy

Yilin Miao | Zendy; Yuhong Tang | Zendy; Huangliang Ren | Zendy; Jianjun Li | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

The impact of integrating shallow and deep information on knowledge distillation

Author(s) -

Yilin Miao,

Yuhong Tang,

Huangliang Ren,

Jianjun Li

Publication year - 2025

Publication title -

ieee access

Language(s) - English

Resource type - Magazines

SCImago Journal Rank - 0.587

H-Index - 127

eISSN - 2169-3536

DOI - 10.1109/access.2025.3571732

Subject(s) - aerospace , bioengineering , communication, networking and broadcast technologies , components, circuits, devices and systems , computing and processing , engineered materials, dielectrics and plasmas , engineering profession , fields, waves and electromagnetics , general topics for engineers , geoscience , nuclear engineering , photonics and electrooptics , power, energy and industry applications , robotics and control systems , signal processing and analysis , transportation

Knowledge distillation is a key technique for compressing neural networks, leveraging insights from a large teacher model to enhance the generalization capability of a smaller student model. The ResNet family, recognized for its residual connections, is commonly employed for various visual tasks. It effectively addresses issues such as gradient vanishing and degradation in deep neural networks, making the training process more manageable. As a deep network architecture, ResNet is proficient in feature extraction and is frequently used as a backbone for knowledge refinement. However, our experimental findings reveal that the ResNet model falls short in extracting shallow information from images, which in turn affects the performance of the corresponding student model. To address these issues, we propose a shallow feature extraction module (SFEM) that enriches the receptive field of images through dilated convolutions and captures information at different scales, while reducing parameter counts and optimizing computational resources. Additionally, to enhance the model's sensitivity to both horizontal and vertical directions, we introduce a full-dimensional perceptual (FDP) attention mechanism. Experimental results demonstrate that on the CIFAR-100 dataset, compared to the baseline model, our model improves the performance of the student model under the same architectural guidance by 1.73%, 1.41%, 1.16%, and 0.72%, respectively. Under different architectural guidance, the performance of the student model on the CIFAR-100 dataset improves by 2.27% and 1.51%, respectively.Compared to state-of-the-art knowledge distillation methods (e.g., RKD and VID), the performance of our student model improves by 1.03% and 1.56%, respectively.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Empowering knowledge with every search