
Emotion Recognition Model Based on Multimodal Decision Fusion
Author(s) -
Zheng Chen,
Chunli Wang,
Ning Jia
Publication year - 2021
Publication title -
journal of physics. conference series
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.21
H-Index - 85
eISSN - 1742-6596
pISSN - 1742-6588
DOI - 10.1088/1742-6596/1873/1/012092
Subject(s) - computer science , speech recognition , facial expression , emotion recognition , extrapolation , feature extraction , modal , motion (physics) , artificial intelligence , action recognition , pattern recognition (psychology) , class (philosophy) , mathematical analysis , chemistry , mathematics , polymer chemistry
In the process of human social activities and daily communication, speech, text and facial expressions are considered as the main channels to convey human emotions. In this paper, a fusion method of multi-modal emotion recognition based on speech, text and motion is proposed. In the speech emotion recognition (SER), a depth wavefield extrapolation - improved wave physics model (DWE-WPM) is designed. In order to simulate the information mining process of LSTM, a user-defined feature extraction scheme is used to reconstruct the wave and inject it into DWE-WPM. In the text emotion recognition (TER), the transformer model with multi attention mechanism is used to recognize the text emotion combined. In the motion emotion recognition (MER), the sequential features of facial expression and hand action are extracted in groups. Combined with the bidirectional three-layer LSTM model with attention mechanism, a joint model of four channels is designed. Experimental results show that the proposed method has high recognition accuracy in multi-modal, and the accuracy is improved by 9% in the interactive emotional dynamic motion capture (IEMOCAP) corpus.