
An Adaptive Methodology for Constructing Domain-Specific Sentiment Lexicons Based on Chinese Social Media Data
Author(s) -
Xue Xu,
Haidong Liu,
Lei Liu
Publication year - 2025
Publication title -
ieee access
Language(s) - English
Resource type - Magazines
SCImago Journal Rank - 0.587
H-Index - 127
eISSN - 2169-3536
DOI - 10.1109/access.2025.3572077
Subject(s) - aerospace , bioengineering , communication, networking and broadcast technologies , components, circuits, devices and systems , computing and processing , engineered materials, dielectrics and plasmas , engineering profession , fields, waves and electromagnetics , general topics for engineers , geoscience , nuclear engineering , photonics and electrooptics , power, energy and industry applications , robotics and control systems , signal processing and analysis , transportation
Currently, many methods for automatically constructing domain-specific sentiment lexicons rely on knowledge repositories and domain-specific corpora. However, these methods often face accuracy challenges due to data sparsity, and inferring the polarity of new domain-specific sentiment words from a limited set of labeled seed words lacks precision. Chinese social media texts typically exhibit a high degree of randomness, noise, and informal sentiment words, which further increases the difficulty of constructing domain-specific sentiment lexicons. To address these challenges, we propose an adaptive framework for constructing domain-specific sentiment lexicons using Chinese social media data and apply it to develop a sentiment lexicon for public opinion during public health emergencies (PHEPO-SentiLex). We first fine-tune Bidirectional Encoder Representations from Transformers (BERT) via a multi-task framework on domain-specific corpus and a small number of Weibo-annotated sentiment datasets, enabling the model to encode both domain semantics and sentiment-related contextual patterns into word embeddings through gradient sharing. The embeddings are subsequently used to calculate the Sentiment Attraction Degree (SAD) during seed word filtering, cosine similarity during domain-specific sentiment word selection, and for constructing the domain-specific corpus-sentiment word graph (SentiGraph). Next, we propose SentiGraph-GCN, a method for sentiment word polarity determination that integrates semantic, sentiment, co-occurrence frequency, and global structural information embedded in the corpus. Experimental results demonstrate that SentiGraph-GCN significantly outperforms existing methods in determining sentiment word polarity. Furthermore, PHEPO-SentiLex exhibits superior accuracy and stability in relevant scenarios compared to general-purpose sentiment lexicons.
Empowering knowledge with every search
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom