
Image Classification Model Based on Contrastive Learning with Dynamic Adaptive Loss
Author(s) -
Quandeng Gou,
Jingxuan Zhou,
Zi Li,
Fangrui Zhang,
Yuheng Ren
Publication year - 2025
Publication title -
ieee access
Language(s) - English
Resource type - Magazines
SCImago Journal Rank - 0.587
H-Index - 127
eISSN - 2169-3536
DOI - 10.1109/access.2025.3574335
Subject(s) - aerospace , bioengineering , communication, networking and broadcast technologies , components, circuits, devices and systems , computing and processing , engineered materials, dielectrics and plasmas , engineering profession , fields, waves and electromagnetics , general topics for engineers , geoscience , nuclear engineering , photonics and electrooptics , power, energy and industry applications , robotics and control systems , signal processing and analysis , transportation
As one of the core tasks in vision recognition, image classification is widely used in various scenarios. Most existing mainstream image classification models use the Convolutional Neural Network (CNN), the Transformer, or a combination of both as the backbone. However, the convolutional operation of CNN relies on local receptive fields, which limits its ability to capture global information about the image. The Transformer typically partitions the image into equal-sized and non-overlapping image patches, which may destroy the continuity of the edges of the image patches, leading to the loss of critical information. In addition, if the combination of CNN and Transformer is not done appropriately, it may significantly weaken the model’s ability to express local features. A new image classification model is proposed to solve the above problems. The backbone of the proposed model consists of CNN and Transformer in tandem, taking advantage of CNN’s strengths in local feature extraction and fully utilizing Transformer’s ability to process the global information of the image, thus deepening the model’s understanding of the image’s semantic relationship. Specifically, the proposed model contains two branches: contrastive learning and classification. The contrastive learning branch introduces the contrastive learning strategy to enhance the distinguishability between features of different samples, thus compensating for the model’s reliance solely on classification supervision signals. The classification branch is based on traditional image classification methods. Still, it improves the cross-entropy loss to develop a dynamic adaptive loss, enabling the model to learn indistinguishable features effectively. At the same time, it also reduces the phenomenon where easily classified samples at the beginning become indistinguishable as the model matures. Experimental results on the ImageNet-1k dataset show that the proposed model achieves a Top-1 accuracy of 82.8%, outperforming existing image classification models based on CNNs, Transformers, or their combinations. Notably, the model achieves high classification accuracy while maintaining a relatively low parameter size (23.9MB) and computational complexity (5.7Mflops). These experimental results fully demonstrate the effectiveness and feasibility of the proposed method.