Plagiarism Detection in Armenian Texts Using Intrinsic Stylometric Analysis | Zendy

Yeva Maksimovna Yeshilbashian | Zendy; Ariana Asatryan | Zendy; Tsolak Ghukasyan | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Plagiarism Detection in Armenian Texts Using Intrinsic Stylometric Analysis

Author(s) -

Yeva Maksimovna Yeshilbashian,

Ariana Asatryan,

Tsolak Ghukasyan

Publication year - 2021

Publication title -

trudy instituta sistemnogo programmirovaniâ ran/trudy instituta sistemnogo programmirovaniâ

Language(s) - English

Resource type - Journals

eISSN - 2220-6426

pISSN - 2079-8156

DOI - 10.15514/ispras-2021-33(1)-14

Subject(s) - computer science , stylometry , artificial intelligence , natural language processing , cluster analysis , plagiarism detection , classifier (uml) , style (visual arts) , parsing , pattern recognition (psychology) , history , archaeology

In this work we study the application of intrinsic stylometric methods to the task of plagiarism detection in Armenian texts. We use two task setups from PAN’s series of conferences on text forensics and stylometry: style change detection and style breach detection. Style change detection aims to determine whether the text is written by more than one author, while style breach detection detects the boundaries of stylistically distinct text fragments. For these tasks, we generate synthetic test sets for three genres of text: academic, literature, and news, and then use them to evaluate the effectiveness of hierarchical clustering and other relevant models from PAN conferences. We employ a standard set of character-level, lexical and readability features, and additionally perform morphological and dependency parsing of text fragments to extract syntactic features encoding author style information. The evaluation results show that the clustering-based approach fails to correctly detect style change detection in longer texts and is only marginally better for shorter texts. For style breach detection, hierarchical clustering-based approach performs better than a random baseline classifier, but the difference is not sufficient to warrant its practical use. In a complementary experiment, we show that reducing the number of features and multicollinearity in them via PCA helps to increase the precision of style breach detection methods for certain text categories.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Empowering knowledge with every search

About

About Careers Publisher Partners Contact Us

Learn

FAQs Blog Terms of Use Privacy Policy

About

Learn

Discover

Explore