Joint processing of audio and visual information for multimedia indexing and human-computer interaction
Author(s) -
Chalapathy Neti,
Benoît Maison,
Andrew W. Senior,
Giridharan Iyengar,
P. Decuetos,
Sankar Basu,
Ashish Verma
Publication year - 2000
Language(s) - English
DOI - 10.5555/2835865.2835896
Information fusion in the context of combining multiple streams of data e. g., audio streams and video streams corresponding to the same perceptual process is considered in a somewhat generalized setting. Specifically, we consider the problem of combining visual cues with audio signals for the purpose of improved automatic machine recognition of descriptors e. g., speech recognition/transcription, speaker change detection, speaker identification and speaker event detection. These happen to be important descriptors for multimedia content (video) for efficient search and retrieval. A general framework for considering all of these fusion problems in a unified setting is considered.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom