
Multimodal Video Sentiment Analysis Using Audio and Text Data
Author(s) -
Yanyan Wang
Publication year - 2021
Publication title -
journal of advances in mathematics and computer science
Language(s) - English
Resource type - Journals
ISSN - 2456-9968
DOI - 10.9734/jamcs/2021/v36i730381
Subject(s) - computer science , word2vec , leverage (statistics) , artificial intelligence , word (group theory) , sentiment analysis , transfer of learning , binary classification , extractor , test data , ranking (information retrieval) , natural language processing , speech recognition , machine learning , information retrieval , support vector machine , embedding , philosophy , linguistics , process engineering , engineering , programming language
Nowadays, video sharing websites are becoming more and more popular, such as YouTube, Tiktok. A good way to analyze a video’s sentiment would greatly improve the user experience and would help with designing better ranking and recommendation systems [1,2]. In this project, we used both acoustic information of a video to predict its sentiment levels. For audio data, we leverage transfer learning technique and use a pre-trained VGGish model as a features extractor to analyze abstract audio embeddings [6]. We then used MOSI dataset [5] to further fine-tune the VGGish model and achieved a test accuracy of 90% for binary classification. For text data, we compared traditional bag-of-word model to LSTM model. We found that LSTM model with word2vec outperformed bag-of-word model and achieved a test accuracy of 84% for binary classification.