Open Access
Micro‐blog sentiment classification using Doc2vec + SVM model with data purification
Author(s) -
Liang Yinghong,
Liu Haitao,
Zhang Su
Publication year - 2020
Publication title -
the journal of engineering
Language(s) - English
Resource type - Journals
ISSN - 2051-3305
DOI - 10.1049/joe.2019.1159
Subject(s) - bottleneck , computer science , sentiment analysis , support vector machine , microblogging , artificial intelligence , task (project management) , machine learning , social media , big data , data mining , world wide web , engineering , systems engineering , embedded system
As a Chinese version of twitter, micro‐blog has been popular for many years. On this platform, a lot of comments are generated explosively every day. These comments contain the public's opinions on various topics, which have wide applications in both academic and industrial fields. In recent years, deep learning and some classification algorithms have been applied to sentiment analysis, and good results have been achieved. However, micro‐blog sentiment classification is a challenging task, because micro‐blog messages are short and noisy, and contain massive user‐invented acronyms and informal words. Unfortunately, most researchers pay more attention to analyse the data after deep learning, but only simply remove the noisy data before using algorithm, so the result of sentiment analysis has reached a bottleneck. Here, the authors first purify the data using varied methods before deep learning, then, the Support Vector Machine (SVM) classification algorithm is applied to sentiment classification of micro‐blog using many types of features. Through comparing with the method of simply pre‐processing data, the results show that their approach can improve the performance of micro‐blog sentiment classification effectively and efficiently.