
Adding Crowd Noise to Sports Commentary using Generative Models
Author(s) -
Neil Shah,
Dharmeshkumar Agrawal,
Niranajan Pedanekar
Publication year - 2021
Language(s) - English
Resource type - Conference proceedings
DOI - 10.5753/lique.2021.15715
Subject(s) - computer science , stadium , unavailability , noise (video) , sound (geography) , generative grammar , similarity (geometry) , speech recognition , human–computer interaction , artificial intelligence , acoustics , engineering , mathematics , physics , geometry , reliability engineering , image (mathematics)
Crowd noise forms an integral part of a live sports experience. In the post-COVID era, when live audiences are absent, crowd noise needs to be added to the live commentary. This paper exploits the correlation between commentary and crowd noise of a live sports event and presents an audio stylizing sports commentary method by generating live stadium-like sound using neural generative models. We use the Generative Adversarial Network (GAN)-based architectures such as Cycle-consistent GANs (Cycle-GANs) and Mel-GANs to generate live stadium-like sound samples given the live commentary. Due to the unavailability of raw commentary sound samples, we use end-to-end time-domain source separation models (SEGAN and Wave-U-Net) to extract commentary sound from combined recordings of the live sound acquired from YouTube highlights of soccer videos. We present a qualitative and a subjective user evaluation of the similarity of the generated live sound with the reference live sound.