An Adaptive Methodology for Constructing Domain-Specific Sentiment Lexicons Based on Chinese Social Media Data
Author(s) -
Xue Xu,
Haidong Liu,
Lei Liu
Publication year - 2025
Publication title -
ieee access
Language(s) - English
Resource type - Magazines
SCImago Journal Rank - 0.587
H-Index - 127
eISSN - 2169-3536
DOI - 10.1109/access.2025.3572077
Subject(s) - aerospace , bioengineering , communication, networking and broadcast technologies , components, circuits, devices and systems , computing and processing , engineered materials, dielectrics and plasmas , engineering profession , fields, waves and electromagnetics , general topics for engineers , geoscience , nuclear engineering , photonics and electrooptics , power, energy and industry applications , robotics and control systems , signal processing and analysis , transportation
Currently, many methods for automatically constructing domain-specific sentiment lexicons rely on knowledge repositories and domain-specific corpora. However, these methods often face accuracy challenges due to data sparsity, and inferring the polarity of new domain-specific sentiment words from a limited set of labeled seed words lacks precision. Chinese social media texts typically exhibit a high degree of randomness, noise, and informal sentiment words, which further increases the difficulty of constructing domain-specific sentiment lexicons. To address these challenges, we propose an adaptive framework for constructing domain-specific sentiment lexicons using Chinese social media data and apply it to develop a sentiment lexicon for public opinion during public health emergencies (PHEPO-SentiLex). We first fine-tune Bidirectional Encoder Representations from Transformers (BERT) via a multi-task framework on domain-specific corpus and a small number of Weibo-annotated sentiment datasets, enabling the model to encode both domain semantics and sentiment-related contextual patterns into word embeddings through gradient sharing. The embeddings are subsequently used to calculate the Sentiment Attraction Degree (SAD) during seed word filtering, cosine similarity during domain-specific sentiment word selection, and for constructing the domain-specific corpus-sentiment word graph (SentiGraph). Next, we propose SentiGraph-GCN, a method for sentiment word polarity determination that integrates semantic, sentiment, co-occurrence frequency, and global structural information embedded in the corpus. Experimental results demonstrate that SentiGraph-GCN significantly outperforms existing methods in determining sentiment word polarity. Furthermore, PHEPO-SentiLex exhibits superior accuracy and stability in relevant scenarios compared to general-purpose sentiment lexicons.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom