z-logo
open-access-imgOpen Access
The impact of integrating shallow and deep information on knowledge distillation
Author(s) -
Yilin Miao,
Yuhong Tang,
Huangliang Ren,
Jianjun Li
Publication year - 2025
Publication title -
ieee access
Language(s) - English
Resource type - Magazines
SCImago Journal Rank - 0.587
H-Index - 127
eISSN - 2169-3536
DOI - 10.1109/access.2025.3571732
Subject(s) - aerospace , bioengineering , communication, networking and broadcast technologies , components, circuits, devices and systems , computing and processing , engineered materials, dielectrics and plasmas , engineering profession , fields, waves and electromagnetics , general topics for engineers , geoscience , nuclear engineering , photonics and electrooptics , power, energy and industry applications , robotics and control systems , signal processing and analysis , transportation
Knowledge distillation is a key technique for compressing neural networks, leveraging insights from a large teacher model to enhance the generalization capability of a smaller student model. The ResNet family, recognized for its residual connections, is commonly employed for various visual tasks. It effectively addresses issues such as gradient vanishing and degradation in deep neural networks, making the training process more manageable. As a deep network architecture, ResNet is proficient in feature extraction and is frequently used as a backbone for knowledge refinement. However, our experimental findings reveal that the ResNet model falls short in extracting shallow information from images, which in turn affects the performance of the corresponding student model. To address these issues, we propose a shallow feature extraction module (SFEM) that enriches the receptive field of images through dilated convolutions and captures information at different scales, while reducing parameter counts and optimizing computational resources. Additionally, to enhance the model's sensitivity to both horizontal and vertical directions, we introduce a full-dimensional perceptual (FDP) attention mechanism. Experimental results demonstrate that on the CIFAR-100 dataset, compared to the baseline model, our model improves the performance of the student model under the same architectural guidance by 1.73%, 1.41%, 1.16%, and 0.72%, respectively. Under different architectural guidance, the performance of the student model on the CIFAR-100 dataset improves by 2.27% and 1.51%, respectively.Compared to state-of-the-art knowledge distillation methods (e.g., RKD and VID), the performance of our student model improves by 1.03% and 1.56%, respectively.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Empowering knowledge with every search

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom