z-logo
open-access-imgOpen Access
Multi-modal Comparative Analysis on Audio Dub Detection using Artificial Intelligence
Author(s) -
Divya Jennifer Dsouza,
Anisha P Rodrigues,
Roshan Fernandes
Publication year - 2025
Publication title -
ieee access
Language(s) - English
Resource type - Magazines
SCImago Journal Rank - 0.587
H-Index - 127
eISSN - 2169-3536
DOI - 10.1109/access.2025.3591306
Subject(s) - aerospace , bioengineering , communication, networking and broadcast technologies , components, circuits, devices and systems , computing and processing , engineered materials, dielectrics and plasmas , engineering profession , fields, waves and electromagnetics , general topics for engineers , geoscience , nuclear engineering , photonics and electrooptics , power, energy and industry applications , robotics and control systems , signal processing and analysis , transportation
Audio dubbing, or inducing fake audio clips to a genuine video or audio file, has shown growing adverse effects in the speech domain sectors, causing concerns to individuals, organizations, and society as a whole. With the advent and rapid popularity of the artificial intelligence (AI) sector, inducing these fake clips is being made possible using generative AI that enables voice cloning and real-time voice conversion from one individual to another. Misuse of this technology would lead to a breach in one’s privacy and misinterpretation, thus requiring an urgent need to address the issue of real-time detection of AI-generated audio dubs and fake audio identification and prevention. This research aims at discovering dubs in audio files using machine learning and deep learning techniques in various modes of execution, such as offline or static, batch, and online or streaming means. The audio files are segmented using libROSA, the Python package for analysis of music and audio files. The river framework works in an online mode, where the processing is done faster using dictionaries with less storage space consumption, leading to a better mode of execution for critical applications in real time. Incremental processing yields improved results in real-time audio applications by enabling low-latency responses and adaptive system behavior. The proposed hybrid deep neural network (hyDNN) gave an accuracy of 99.19% on custom data in incremental mode.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom