Multimodal Fusion of BERT-CNN and Gated CNN Representations for Depression Detection | Zendy

Mariana Rodrigues Makiuchi | Zendy; Tifani Warnita | Zendy; Kuniaki Uto | Zendy; Koichi Shinoda | Zendy

AI Assistant Blog Pricing

Open Access

Multimodal Fusion of BERT-CNN and Gated CNN Representations for Depression Detection

Author(s) -

Mariana Rodrigues Makiuchi,

Tifani Warnita,

Kuniaki Uto,

Koichi Shinoda

Publication year - 2019

Publication title -

tokyo tech research repository (tokyo institute of technology)

Language(s) - English

Resource type - Conference proceedings

ISBN - 978-1-4503-6913-8

DOI - 10.1145/3347320.3357694

Subject(s) - computer science , convolutional neural network , modality (human–computer interaction) , artificial intelligence , set (abstract data type) , feature (linguistics) , representation (politics) , natural language processing , modalities , pattern recognition (psychology) , speech recognition , machine learning , linguistics , social science , philosophy , sociology , politics , political science , law , programming language

Depression is a common, but serious mental disorder that affects people all over the world. Besides providing an easier way of diagnosing the disorder, a computer-aided automatic depression assessment system is demanded in order to reduce subjective bias in the diagnosis. We propose a multimodal fusion of speech and linguistic representation for depression detection. We train our model to infer the Patient Health Questionnaire (PHQ) score of subjects from AVEC 2019 DDS Challenge database, the E-DAIC corpus. For the speech modality, we use deep spectrum features extracted from a pretrained VGG-16 network and employ a Gated Convolutional Neural Network (GCNN) followed by a LSTM layer. For the textual embeddings, we extract BERT textual features and employ a Convolutional Neural Network (CNN) followed by a LSTM layer. We achieved a CCC score equivalent to 0.497 and 0.608 on the E-DAIC corpus development set using the unimodal speech and linguistic models respectively. We further combine the two modalities using a feature fusion approach in which we apply the last representation of each single modality model to a fully-connected layer in order to estimate the PHQ score. With this multimodal approach, it was possible to achieve the CCC score of 0.696 on the development set and 0.403 on the testing set of the E-DAIC corpus, which shows an absolute improvement of 0.283 points from the challenge baseline.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom

About

About Careers Publisher Partners Contact Us Our institutional solutions Get Organisational Trial or Quote

Learn

FAQs Blog Terms of Use Privacy Policy

Download the Zendy App

Discover

Explore

Home ZAIA Blog