z-logo
open-access-imgOpen Access
Swin Transformer with Late-Fusion Feature Aggregation for Multi-Modal Vehicle Reidentification
Author(s) -
Reza Fuad Rachmadi,
Supeno Mardi Susiki Nugroho,
I Ketut Eddy Purnama
Publication year - 2025
Publication title -
ieee access
Language(s) - English
Resource type - Magazines
SCImago Journal Rank - 0.587
H-Index - 127
eISSN - 2169-3536
DOI - 10.1109/access.2025.3591251
Subject(s) - aerospace , bioengineering , communication, networking and broadcast technologies , components, circuits, devices and systems , computing and processing , engineered materials, dielectrics and plasmas , engineering profession , fields, waves and electromagnetics , general topics for engineers , geoscience , nuclear engineering , photonics and electrooptics , power, energy and industry applications , robotics and control systems , signal processing and analysis , transportation
Vehicle reidentification is a problem that coexisted with the advent of CCTV technology for road monitoring. Vehicle image data from low-light environments is very challenging for reidentification tasks, and multi-modal data (visible, near-infrared, and thermal) is often used to improve model performance. In this paper, we proposed a Swin Transformer classifier with late-fusion feature aggregation networks called SAFA (Self-Attention Feature Aggregation) for multi-modal vehicle reidentification problems. The proposed SAFA classifier is constructed by attaching multi-head networks with CBAM (Convolutional Block Attention Module) and MWN (Modality Weighted Network) to the three parallel shared Swin Transformer architecture for each modality input (visual, near-infrared, and thermal). Three multi-modal vehicle reidentification datasets were used to evaluate our proposed classifier, including RGBN300, RGBNT100, and WMVeID863 datasets. Experiments on the RGBN300, RGBNT100, and WMVeID863 datasets show that our proposed classifier can achieve good performance, with an mAP of 85.1% on the RGBN300 dataset, an mAP of 88.4% on the RGBNT100 dataset, and an mAP of 70.5% on the WMVeID863 dataset. Further analysis using t-SNE and GradCAM visualization shows that our proposed classifier can effectively distinguish different vehicle IDs by extracting strong features, with the headlight and backlight of the vehicle being the main regions extracted in the SAFA network.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom