Multi-modal Comparative Analysis on Audio Dub Detection using Artificial Intelligence | Zendy

Divya Jennifer Dsouza | Zendy; Anisha P Rodrigues | Zendy; Roshan Fernandes | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Multi-modal Comparative Analysis on Audio Dub Detection using Artificial Intelligence

Author(s) -

Divya Jennifer Dsouza,

Anisha P Rodrigues,

Roshan Fernandes

Publication year - 2025

Publication title -

ieee access

Language(s) - English

Resource type - Magazines

SCImago Journal Rank - 0.587

H-Index - 127

eISSN - 2169-3536

DOI - 10.1109/access.2025.3591306

Subject(s) - aerospace , bioengineering , communication, networking and broadcast technologies , components, circuits, devices and systems , computing and processing , engineered materials, dielectrics and plasmas , engineering profession , fields, waves and electromagnetics , general topics for engineers , geoscience , nuclear engineering , photonics and electrooptics , power, energy and industry applications , robotics and control systems , signal processing and analysis , transportation

Audio dubbing, or inducing fake audio clips to a genuine video or audio file, has shown growing adverse effects in the speech domain sectors, causing concerns to individuals, organizations, and society as a whole. With the advent and rapid popularity of the artificial intelligence (AI) sector, inducing these fake clips is being made possible using generative AI that enables voice cloning and real-time voice conversion from one individual to another. Misuse of this technology would lead to a breach in one’s privacy and misinterpretation, thus requiring an urgent need to address the issue of real-time detection of AI-generated audio dubs and fake audio identification and prevention. This research aims at discovering dubs in audio files using machine learning and deep learning techniques in various modes of execution, such as offline or static, batch, and online or streaming means. The audio files are segmented using libROSA, the Python package for analysis of music and audio files. The river framework works in an online mode, where the processing is done faster using dictionaries with less storage space consumption, leading to a better mode of execution for critical applications in real time. Incremental processing yields improved results in real-time audio applications by enabling low-latency responses and adaptive system behavior. The proposed hybrid deep neural network (hyDNN) gave an accuracy of 99.19% on custom data in incremental mode.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research