z-logo
open-access-imgOpen Access
Analysis of the Impact of the Stylometric Characteristics of Different Levels for the Verification of Authors of the Prose
Author(s) -
A. M. Manakhova,
Nadezhda Lagutina
Publication year - 2021
Publication title -
modelirovanie i analiz informacionnyh sistem
Language(s) - English
Resource type - Journals
eISSN - 2313-5417
pISSN - 1818-1015
DOI - 10.18255/1818-1015-2021-3-260-279
Subject(s) - stylometry , computer science , classifier (uml) , natural language processing , artificial intelligence , adaboost , pattern recognition (psychology) , information retrieval , machine learning
This article is dedicated to the analysis of various stylometric characteristics combinations of different levels for the quality of verification of authorship of Russian, English and French prose texts. The research was carried out for both low-level stylometric characteristics based on words and symbols and higher-level structural characteristics. All stylometric characteristics were calculated automatically with the help of the ProseRhythmDetector program. This approach gave a possibility to analyze the works of a large volume and of many writers at the same time. During the work, vectors of stylometric characteristics of the level of symbols, words and structure were compared to each text. During the experiments, the sets of parameters of these three levels were combined with each other in all possible ways. The resulting vectors of stylometric characteristics were applied to the input of various classifiers to perform verification and identify the most appropriate classifier for solving the problem. The best results were obtained with the help of the AdaBoost classifier. The average F-score for all languages turned out to be more than 92 %. Detailed assessments of the quality of verification are given and analyzed for each author. Use of high-level stylometric characteristics, in particular, frequency of using N-grams of POS tags, offers the prospect of a more detailed analysis of the style of one or another author. The results of the experiments show that when the characteristics of the structure level are combined with the characteristics of the level of words and / or symbols, the most accurate results of verification of authorship for literary texts in Russian, English and French are obtained. Additionally, the authors were able to conclude about a different degree of impact of stylometric characteristics for the quality of verification of authorship for different languages.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here