
Bimodal Emotion Recognition using Machine Learning
Author(s) -
S. Manisha,
H Saida Nafisa,
Nandita Gopal,
René Anand
Publication year - 2021
Publication title -
international journal of engineering and advanced technology
Language(s) - English
Resource type - Journals
ISSN - 2249-8958
DOI - 10.35940/ijeat.d2451.0410421
Subject(s) - computer science , speech recognition , surprise , emotion classification , convolutional neural network , disgust , facial expression , artificial intelligence , anger , modality (human–computer interaction) , audio visual , multimedia , psychology , communication , psychiatry
The predominant communication channel to conveyrelevant and high impact information is the emotions that isembedded on our communications. Researchers have tried toexploit these emotions in recent years for human robotinteractions (HRI) and human computer interactions (HCI).Emotion recognition through speech or through facial expressionis termed as single mode emotion recognition. The rate ofaccuracy of these single mode emotion recognitions are improvedusing the proposed bimodal method by combining the modalitiesof speech and facing and recognition of emotions using aConvolutional Neural Network (CNN) model. In this paper, theproposed bimodal emotion recognition system, contains threemajor parts such as processing of audio, processing of video andfusion of data for detecting the emotion of a person. The fusion ofvisual information and audio data obtained from two differentchannels enhances the emotion recognition rate by providing thecomplementary data. The proposed method aims to classify 7basic emotions (anger, disgust, fear, happy, neutral, sad, surprise)from an input video. We take audio and image frame from thevideo input to predict the final emotion of a person. The datasetused is an audio-visual dataset uniquely suited for the study ofmulti-modal emotion expression and perception. Dataset usedhere is RAVDESS dataset which contains audio-visual dataset,visual dataset and audio dataset. For bimodal emotion detectionthe audio-visual dataset is used.