z-logo
open-access-imgOpen Access
BPN Based Likelihood Ratio Score Fusion for Audio-Visual Speaker Identification in Response to Noise
Author(s) -
Md. Rabiul Islam,
M. Abdus Sobhan
Publication year - 2014
Publication title -
isrn artificial intelligence
Language(s) - English
Resource type - Journals
eISSN - 2090-7443
pISSN - 2090-7435
DOI - 10.1155/2014/737814
Subject(s) - computer science , mel frequency cepstrum , speech recognition , artificial intelligence , pattern recognition (psychology) , feature (linguistics) , audio signal , noise (video) , artificial neural network , hidden markov model , feature extraction , speech coding , image (mathematics) , philosophy , linguistics
This paper deals with a new and improved approach of Back-propagation learning neural network based likelihood ratio score fusion technique for audio-visual speaker Identification in various noisy environments. Different signal preprocessing and noise removing techniques have been used to process the speech utterance and LPC, LPCC, RCC, MFCC, ΔMFCC and ΔΔMFCC methods have been applied to extract the features from the audio signal. Active Shape Model has been used to extract the appearance and shape based facial features. To enhance the performance of the proposed system, appearance and shape based facial features are concatenated and Principal Component Analysis method has been used to reduce the dimension of the facial feature vector. The audio and visual feature vectors are then fed to Hidden Markov Model separately to find out the log-likelihood of each modality. The reliability of each modality has been calculated using reliability measurement method. Finally, these integrated likelihood ratios are fed to Back-propagation learning neural network algorithm to discover the final speaker identification result. For measuring the performance of the proposed system, three different databases, that is, NOIZEUS speech database, ORL face database and VALID audio-visual multimodal database have been used for audio-only, visual-only, and audio-visual speaker identification. To identify the accuracy of the proposed system with existing techniques under various noisy environment, different types of artificial noise have been added at various rates with audio and visual signal and performance being compared with different variations of audio and visual features.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom