Joint processing of audio and visual information for multimedia indexing and human-computer interaction | Zendy

Chalapathy  Neti | Zendy; Benoît  Maison | Zendy; Andrew W. Senior | Zendy; Giridharan  Iyengar | Zendy; P.  Decuetos | Zendy; Sankar  Basu | Zendy; Ashish  Verma | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Joint processing of audio and visual information for multimedia indexing and human-computer interaction

Author(s) -

Chalapathy Neti,

Benoît Maison,

Andrew W. Senior,

Giridharan Iyengar,

P. Decuetos,

Sankar Basu,

Ashish Verma

Publication year - 2000

Language(s) - English

DOI - 10.5555/2835865.2835896

Information fusion in the context of combining multiple streams of data e. g., audio streams and video streams corresponding to the same perceptual process is considered in a somewhat generalized setting. Specifically, we consider the problem of combining visual cues with audio signals for the purpose of improved automatic machine recognition of descriptors e. g., speech recognition/transcription, speaker change detection, speaker identification and speaker event detection. These happen to be important descriptors for multimedia content (video) for efficient search and retrieval. A general framework for considering all of these fusion problems in a unified setting is considered.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research