Open AccessANIM-400K: A Large-Scale Dataset for Automated End-To-End Dubbing of VideoOpen Access
Author(s)
Kevin Cai,
Chonghua Liu,
David M. Chan
Publication year2024
The Internet's wealth of content, with up to 60% published in English,starkly contrasts the global population, where only 18.8% are English speakers,and just 5.1% consider it their native language, leading to disparities inonline information access. Unfortunately, automated processes for dubbing ofvideo - replacing the audio track of a video with a translated alternative -remains a complex and challenging task due to pipelines, necessitating precisetiming, facial movement synchronization, and prosody matching. While end-to-enddubbing offers a solution, data scarcity continues to impede the progress ofboth end-to-end and pipeline-based methods. In this work, we introduceAnim-400K, a comprehensive dataset of over 425K aligned animated video segmentsin Japanese and English supporting various video-related tasks, includingautomated dubbing, simultaneous translation, guided video summarization, andgenre/theme/style classification. Our dataset is made publicly available forresearch purposes at https://github.com/davidmchan/Anim400K.
Language(s)English
Seeing content that should not be on Zendy? Contact us.
To access your conversation history and unlimited prompts, please
Prompt 0/10