Swin Transformer with Late-Fusion Feature Aggregation for Multi-Modal Vehicle Reidentification | Zendy

Reza Fuad Rachmadi | Zendy; Supeno Mardi Susiki Nugroho | Zendy; I Ketut Eddy Purnama | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Swin Transformer with Late-Fusion Feature Aggregation for Multi-Modal Vehicle Reidentification

Author(s) -

Reza Fuad Rachmadi,

Supeno Mardi Susiki Nugroho,

I Ketut Eddy Purnama

Publication year - 2025

Publication title -

ieee access

Language(s) - English

Resource type - Magazines

SCImago Journal Rank - 0.587

H-Index - 127

eISSN - 2169-3536

DOI - 10.1109/access.2025.3591251

Subject(s) - aerospace , bioengineering , communication, networking and broadcast technologies , components, circuits, devices and systems , computing and processing , engineered materials, dielectrics and plasmas , engineering profession , fields, waves and electromagnetics , general topics for engineers , geoscience , nuclear engineering , photonics and electrooptics , power, energy and industry applications , robotics and control systems , signal processing and analysis , transportation

Vehicle reidentification is a problem that coexisted with the advent of CCTV technology for road monitoring. Vehicle image data from low-light environments is very challenging for reidentification tasks, and multi-modal data (visible, near-infrared, and thermal) is often used to improve model performance. In this paper, we proposed a Swin Transformer classifier with late-fusion feature aggregation networks called SAFA (Self-Attention Feature Aggregation) for multi-modal vehicle reidentification problems. The proposed SAFA classifier is constructed by attaching multi-head networks with CBAM (Convolutional Block Attention Module) and MWN (Modality Weighted Network) to the three parallel shared Swin Transformer architecture for each modality input (visual, near-infrared, and thermal). Three multi-modal vehicle reidentification datasets were used to evaluate our proposed classifier, including RGBN300, RGBNT100, and WMVeID863 datasets. Experiments on the RGBN300, RGBNT100, and WMVeID863 datasets show that our proposed classifier can achieve good performance, with an mAP of 85.1% on the RGBN300 dataset, an mAP of 88.4% on the RGBNT100 dataset, and an mAP of 70.5% on the WMVeID863 dataset. Further analysis using t-SNE and GradCAM visualization shows that our proposed classifier can effectively distinguish different vehicle IDs by extracting strong features, with the headlight and backlight of the vehicle being the main regions extracted in the SAFA network.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research