Feature Specific Hybrid Framework on composition of Deep learning architecture for speech emotion recognition | Zendy

Mansoor Hussain | Zendy; S Abishek | Zendy; K P Ashwanth | Zendy; C Bharanidharan | Zendy; Sharath Girish | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Feature Specific Hybrid Framework on composition of Deep learning architecture for speech emotion recognition

Author(s) -

Mansoor Hussain,

S Abishek,

K P Ashwanth,

C Bharanidharan,

Sharath Girish

Publication year - 2021

Publication title -

journal of physics. conference series

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.21

H-Index - 85

eISSN - 1742-6596

pISSN - 1742-6588

DOI - 10.1088/1742-6596/1916/1/012094

Subject(s) - computer science , softmax function , speech recognition , artificial intelligence , deep learning , feature (linguistics) , feature extraction , feature learning , artificial neural network , mel frequency cepstrum , categorical variable , pattern recognition (psychology) , machine learning , philosophy , linguistics

Speech cues may be used to identify human emotions using deep learning model of speech emotion recognition using supervised learning or unsupervised learning as machine learning concepts, and then it build the speech emotion databases for test data prediction. Despite of many advantageous, still it suffers from accuracy and other aspects. In order to mitigate those issues, we propose a new feature specific hybrid framework on composition of deep learning architecture such as recurrent neural network and convolution neural network for speech emotion recognition. It analyses different characteristics to make a better description of speech emotion. Initially it uses feature extraction technique using bag-of-Audio-word model to Mel-frequency cepstral factor characteristics and a pack of acoustic words composed of emotion features to feed the hybrid deep learning architecture to result in high classification and prediction accuracy. In addition, the proposed hybrid networks’ output is concatenated and loaded into this layer of softmax, which produces a for speech recognition, a categorical classification statistic is used. The proposed model is based on the Ryerson Audio-Visual Database of Emotional Speech and Song audio (RAVDESS) dataset, which comprises eight emotional groups. Experimental results on dataset prove that proposed framework performs better in terms of 89.5% recognition rate and 98% accuracy against state of art approaches.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Empowering knowledge with every search

About

About Careers Publisher Partners Contact Us

Learn

FAQs Blog Terms of Use Privacy Policy

About

Learn

Discover

Explore