BPN Based Likelihood Ratio Score Fusion for Audio-Visual Speaker Identification in Response to Noise | Zendy

Md. Rabiul Islam | Zendy; M. Abdus Sobhan | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

BPN Based Likelihood Ratio Score Fusion for Audio-Visual Speaker Identification in Response to Noise

Author(s) -

Md. Rabiul Islam,

M. Abdus Sobhan

Publication year - 2014

Publication title -

isrn artificial intelligence

Language(s) - English

Resource type - Journals

eISSN - 2090-7443

pISSN - 2090-7435

DOI - 10.1155/2014/737814

Subject(s) - computer science , mel frequency cepstrum , speech recognition , artificial intelligence , pattern recognition (psychology) , feature (linguistics) , audio signal , noise (video) , artificial neural network , hidden markov model , feature extraction , speech coding , image (mathematics) , philosophy , linguistics

This paper deals with a new and improved approach of Back-propagation learning neural network based likelihood ratio score fusion technique for audio-visual speaker Identification in various noisy environments. Different signal preprocessing and noise removing techniques have been used to process the speech utterance and LPC, LPCC, RCC, MFCC, ΔMFCC and ΔΔMFCC methods have been applied to extract the features from the audio signal. Active Shape Model has been used to extract the appearance and shape based facial features. To enhance the performance of the proposed system, appearance and shape based facial features are concatenated and Principal Component Analysis method has been used to reduce the dimension of the facial feature vector. The audio and visual feature vectors are then fed to Hidden Markov Model separately to find out the log-likelihood of each modality. The reliability of each modality has been calculated using reliability measurement method. Finally, these integrated likelihood ratios are fed to Back-propagation learning neural network algorithm to discover the final speaker identification result. For measuring the performance of the proposed system, three different databases, that is, NOIZEUS speech database, ORL face database and VALID audio-visual multimodal database have been used for audio-only, visual-only, and audio-visual speaker identification. To identify the accuracy of the proposed system with existing techniques under various noisy environment, different types of artificial noise have been added at various rates with audio and visual signal and performance being compared with different variations of audio and visual features.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research