Who's Speaking?
Author(s) -
Punarjay Chakravarty,
Sayeh Mirzaei,
Tinne Tuytelaars,
Hugo Van hamme
Publication year - 2015
Publication title -
lirias (ku leuven)
Language(s) - English
Resource type - Conference proceedings
DOI - 10.1145/2818346.2820780
Subject(s) - computer science , movement (music) , microphone , microphone array , speech recognition , computer vision , artificial intelligence , head (geology) , acoustics , telecommunications , physics , sound pressure , geomorphology , geology
Active speakers have traditionally been identified in video by detecting their moving lips. This paper demonstrates the same using spatio-temporal features that aim to capture other cues: movement of the head, upper body and hands of active speakers. Speaker directional information, obtained using sound source localization from a microphone array is used to supervise the training of these video features.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom