Open Access
Classification of negative publication in mass media using topic modeling
Author(s) -
Kirill Yakunin,
Ravil I. Mukhamediev,
Yan Kuchin,
Rustam Musabayev,
Timur Buldybayev,
Sanzhar Murzakhmetov
Publication year - 2021
Publication title -
journal of physics. conference series
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.21
H-Index - 85
eISSN - 1742-6596
pISSN - 1742-6588
DOI - 10.1088/1742-6596/1727/1/012019
Subject(s) - computer science , natural language processing , task (project management) , topic model , text corpus , information retrieval , artificial intelligence , management , economics
The paper proposes a method for evaluating text documents by arbitrary criteria, combining the topic modeling on the text corpora and multiple-criteria decision making. The evaluation is based on an analysis of the corpora as follows: the conditional probability distribution of media by topics, properties and classes is calculated after the formation of the topic model of the corpora. Weights assigned by experts to each topic along with topic model can be applied to evaluate each document in the corpora according to each of the considered criteria and classes. The proposed method was applied to a corpus of 804829 news publications from 40 Kazakhstani sources published from 01.01.2018 to 31.12.2019, in order to classify negative information on socially significant topics. A BigARTM model was calculated (200 topics) and the proposed model was applied. Experiments confirm the general possibility of evaluating the sentiment of publications using the topic model of the text corpora, since ROC AUC score of 0.93 was achieved on the classification task.