Premium
Exploring and inferring user–user pseudo‐friendship for sentiment analysis with heterogeneous networks
Author(s) -
Deng Hongbo,
Han Jiawei,
Li Hao,
Ji Heng,
Wang Hongning,
Lu Yue
Publication year - 2014
Publication title -
statistical analysis and data mining: the asa data science journal
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.381
H-Index - 33
eISSN - 1932-1872
pISSN - 1932-1864
DOI - 10.1002/sam.11223
Subject(s) - computer science , friendship , similarity (geometry) , sentiment analysis , information retrieval , social network analysis , data mining , social media , social network (sociolinguistics) , artificial intelligence , machine learning , world wide web , psychology , social psychology , image (mathematics)
With the development of social media and social networks, user‐generated content, such as forums, blogs and comments, are not only getting richer, but also ubiquitously interconnected with many other objects and entities, forming a heterogeneous information network between them. Sentiment analysis on such kinds of data can no longer ignore the information network, since it carries a lot of rich and valuable information, explicitly or implicitly, where some of them can be observed while others are not. However, most existing methods may heavily rely on the observed user–user friendship or similarity between objects, and can only handle a subgraph associated with a single topic. None of them takes into account the hidden and implicit dissimilarity, opposite opinions, and foe relationship. In this paper, we propose a novel information network‐based framework which can infer hidden similarity and dissimilarity between users by exploring similar and opposite opinions, so as to improve post‐level and user‐level sentiment classification at the same time. More specifically, we develop a new meta path ‐based measure for inferring pseudo‐friendship as well as dissimilarity between users, and propose a semi‐supervised refining model by encoding similarity and dissimilarity from both user‐level and post‐level relations. We extensively evaluate the proposed approach and compare with several state‐of‐the‐art techniques on two real‐world forum datasets. Experimental results show that our proposed model with 10.5% labeled samples can achieve better performance than a traditional supervised model trained on 61.7% data samples.