Adding Crowd Noise to Sports Commentary using Generative Models
Author(s) -
Neil Shah,
Dharmeshkumar Agrawal,
Niranajan Pedanekar
Publication year - 2021
Language(s) - English
Resource type - Conference proceedings
DOI - 10.5753/lique.2021.15715
Subject(s) - computer science , stadium , unavailability , noise (video) , sound (geography) , generative grammar , similarity (geometry) , speech recognition , human–computer interaction , artificial intelligence , acoustics , engineering , mathematics , physics , geometry , reliability engineering , image (mathematics)
Crowd noise forms an integral part of a live sports experience. In the post-COVID era, when live audiences are absent, crowd noise needs to be added to the live commentary. This paper exploits the correlation between commentary and crowd noise of a live sports event and presents an audio stylizing sports commentary method by generating live stadium-like sound using neural generative models. We use the Generative Adversarial Network (GAN)-based architectures such as Cycle-consistent GANs (Cycle-GANs) and Mel-GANs to generate live stadium-like sound samples given the live commentary. Due to the unavailability of raw commentary sound samples, we use end-to-end time-domain source separation models (SEGAN and Wave-U-Net) to extract commentary sound from combined recordings of the live sound acquired from YouTube highlights of soccer videos. We present a qualitative and a subjective user evaluation of the similarity of the generated live sound with the reference live sound.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom