
Emostemmer: An Effective Program for Determining Emotions in Russian Using N-grams (Emotiograms)
Author(s) -
Mohsin Manshad Abbasi,
A. P. Bel'tyukov
Publication year - 2021
Publication title -
intellektualʹnye sistemy v proizvodstve
Language(s) - English
Resource type - Journals
eISSN - 2410-9304
pISSN - 1813-7911
DOI - 10.22213/2410-9304-2021-4-148-157
Subject(s) - computer science , natural language processing , identification (biology) , parsing , root (linguistics) , artificial intelligence , word (group theory) , government (linguistics) , linguistics , philosophy , botany , biology
Emotions and the analysis of their expression in texts is a topic of growing interest in recent years. Researchers are trying to create an intelligent machine that can not only read the text, but also determine its emotional state. The results obtained can be used to prepare the machine for future predictions of the emotional orientation of texts, their authors and readers. This text analysis can also be used to get feedback from people about a product or service, reaction to an event or government policy, etc. It includes syntactic as well as semantic text analysis. Parsing consists of identifying words that represent emotions in a text. For its identification, the stemmer plays an important role - the stem or root of the word. In many languages of the Romano-Germanic group, the identification of words representing emotions is much easier than in Russian, since one word represents emotion regardless of grammatical forms and genders. While for a language such as Russian, where the ending of an emotionally charged word changes depending on the genus, species, etc., the analysis becomes more complex. There are different methods of defining emotions in a text. This work focuses on identifying emotions from the text while limiting the complexity of the algorithm by requiring a minimum amount of memory and time. We have created the Emostemmer program, which is an N-gram stemmer (in which letters from words are grouped in a sequence of 2 letters, 3 letters… ..N letters called N-grams) to identify words that represent emotions in the text. The performance of Emostemmer versus RuSentiLex was determined by training and testing a support vector machine classifier with both algorithms. The results of the work are described in detail below in the “Methodology” and “Discussion” sections.