z-logo
open-access-imgOpen Access
Multimodal Video Sentiment Analysis Using Audio and Text Data
Author(s) -
Yanyan Wang
Publication year - 2021
Publication title -
journal of advances in mathematics and computer science
Language(s) - English
Resource type - Journals
ISSN - 2456-9968
DOI - 10.9734/jamcs/2021/v36i730381
Subject(s) - computer science , word2vec , leverage (statistics) , artificial intelligence , word (group theory) , sentiment analysis , transfer of learning , binary classification , extractor , test data , ranking (information retrieval) , natural language processing , speech recognition , machine learning , information retrieval , support vector machine , embedding , philosophy , linguistics , process engineering , engineering , programming language
Nowadays, video sharing websites are becoming more and more popular, such as YouTube, Tiktok. A good way to analyze a video’s sentiment would greatly improve the user experience and would help with designing better ranking and recommendation systems [1,2]. In this project, we used both acoustic information of a video to predict its sentiment levels. For audio data, we leverage transfer learning technique and use a pre-trained VGGish model as a features extractor to analyze abstract audio embeddings [6]. We then used MOSI dataset [5] to further fine-tune the VGGish model and achieved a test accuracy of 90% for binary classification. For text data, we compared traditional bag-of-word model to LSTM model. We found that LSTM model with word2vec outperformed bag-of-word model and achieved a test accuracy of 84% for binary classification.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom