
Feature Specific Hybrid Framework on composition of Deep learning architecture for speech emotion recognition
Author(s) -
Mansoor Hussain,
S Abishek,
K P Ashwanth,
C Bharanidharan,
Sharath Girish
Publication year - 2021
Publication title -
journal of physics. conference series
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.21
H-Index - 85
eISSN - 1742-6596
pISSN - 1742-6588
DOI - 10.1088/1742-6596/1916/1/012094
Subject(s) - computer science , softmax function , speech recognition , artificial intelligence , deep learning , feature (linguistics) , feature extraction , feature learning , artificial neural network , mel frequency cepstrum , categorical variable , pattern recognition (psychology) , machine learning , philosophy , linguistics
Speech cues may be used to identify human emotions using deep learning model of speech emotion recognition using supervised learning or unsupervised learning as machine learning concepts, and then it build the speech emotion databases for test data prediction. Despite of many advantageous, still it suffers from accuracy and other aspects. In order to mitigate those issues, we propose a new feature specific hybrid framework on composition of deep learning architecture such as recurrent neural network and convolution neural network for speech emotion recognition. It analyses different characteristics to make a better description of speech emotion. Initially it uses feature extraction technique using bag-of-Audio-word model to Mel-frequency cepstral factor characteristics and a pack of acoustic words composed of emotion features to feed the hybrid deep learning architecture to result in high classification and prediction accuracy. In addition, the proposed hybrid networks’ output is concatenated and loaded into this layer of softmax, which produces a for speech recognition, a categorical classification statistic is used. The proposed model is based on the Ryerson Audio-Visual Database of Emotional Speech and Song audio (RAVDESS) dataset, which comprises eight emotional groups. Experimental results on dataset prove that proposed framework performs better in terms of 89.5% recognition rate and 98% accuracy against state of art approaches.