z-logo
open-access-imgOpen Access
Attention Source Device Identification Using Audio Content from Videos and Grad-CAM Explanations
Author(s) -
Christos Korgialas,
Constantine Kotropoulos
Publication year - 2025
Publication title -
ieee open journal of signal processing
Language(s) - English
Resource type - Magazines
eISSN - 2644-1322
DOI - 10.1109/ojsp.2025.3620713
Subject(s) - signal processing and analysis
An approach to Source Device Identification (SDI) is proposed, leveraging a Residual Network (ResNet) architecture enhanced with the Convolutional Block Attention Module (CBAM). The approach employs log-Mel spectrograms of audio content from videos in the VISION dataset captured by 35 different devices. A content-disjoint evaluation protocol is applied at the recording level to eliminate content bias across splits, supported by fixed-length segmentation and structured patch extraction for input generation. Moreover, Gradient-weighted Class Activation Mapping (Grad-CAM) is exploited to highlight the spectrogram regions that contribute most to the identification process, thus enabling interpretability. Quantitatively, the CBAM ResNet model is compared with existing methods, demonstrating an increased SDI accuracy across scenarios, including flat, indoor, and outdoor environments. A statistical significance test is conducted to assess the SDI accuracies, while an ablation study is performed to analyze the effect of attention mechanisms on the proposed model's performance. Additional evaluations are performed using the FloreView and POLIPHONE datasets to validate the model's generalization capabilities across unseen devices via transfer learning, assessing robustness under various conditions.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom