Cross lingual speech emotion recognition via triple attentive asymmetric convolutional neural network | Zendy

Ocquaye Elias N. N. | Zendy; Mao Qirong | Zendy; Xue Yanfei | Zendy; Song Heping | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Premium

Cross lingual speech emotion recognition via triple attentive asymmetric convolutional neural network

Author(s) -

Ocquaye Elias N. N.,

Mao Qirong,

Xue Yanfei,

Song Heping

Publication year - 2021

Publication title -

international journal of intelligent systems

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 1.291

H-Index - 87

eISSN - 1098-111X

pISSN - 0884-8173

DOI - 10.1002/int.22291

Subject(s) - computer science , softmax function , discriminative model , artificial intelligence , convolutional neural network , speech recognition , pattern recognition (psychology) , feature (linguistics) , domain adaptation , natural language processing , classifier (uml) , linguistics , philosophy

The application of cross‐corpus for speech emotion recognition (SER) via domain adaptation methods have gain high acknowledgment for developing good robust emotion recognition systems using different corpora or datasets. However, the issue of cross‐lingual still remains a challenge in SER and needs more attention to resolve the scenario of applying different language types in both training and testing. In this paper, we propose a triple attentive asymmetric convolutional neural network to address the recognition of emotions for cross‐lingual and cross‐corpus speech in an unsupervised approach. The proposed method adopts the joint supervision of softmax loss and center loss to learn high power discriminative feature representations for target domain via the use of high quality pseudo‐labels. The proposed model uses three attentive convolutional neural networks asymmetrically, where two of the networks are used to artificially label unlabeled target samples as a result of their predictions from training on source labeled samples and the other network is used to obtain salient target discriminative features from the pseudo‐labeled target samples. We evaluate our proposed method on three different language types (i.e., English, German, and Italian) data sets. The experimental results indicate that, our proposed method achieves higher prediction accuracy over other state‐of‐the‐art methods.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here

Accelerating Research