Premium
Audiovisual localization of multiple speakers in a video teleconferencing setting
Author(s) -
Kapralos Bill,
Jenkin Michael R. M.,
Milios Evangelos
Publication year - 2003
Publication title -
international journal of imaging systems and technology
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.359
H-Index - 47
eISSN - 1098-1098
pISSN - 0899-9457
DOI - 10.1002/ima.10045
Subject(s) - teleconference , computer science , videoconferencing , multimedia
Attending to multiple speakers in a video teleconferencing setting is a complex task. From a visual point of view, multiple speakers can occur at different locations and present radically different appearances. From an audio point of view, multiple speakers may be speaking at the same time, and background noise may make it difficult to localize sound sources without some a priori estimate of the sound source locations. This article presents a novel sensor and corresponding sensing algorithms to address the task of attending, simultaneously, to multiple speakers for video teleconferencing. A panoramic visual sensor is used to capture a 360° view of the speakers in the environment and from this view potential speakers are identified via a color histogram approach. A directional audio system based on beamforming is then used to confirm potential speakers and attend to them. Experimental evaluation of the sensor and its algorithms are presented including sample performance of the entire system in a teleconferencing setting. © 2003 Wiley Periodicals, Inc. Int J Imaging Syst Technol 13: 95–105, 2003; Published online in Wiley InterScience (www.interscience.wiley.com). DOI 10.1002/ima.10045