DOC2VEC OR BETTER INTERPRETABILITY? A METHOD STUDY FOR AUTHORSHIP ATTRIBUTION | Zendy

Elena Vladimirovna Pimonova | Zendy; Oleg Durandin | Zendy; Alexey Malafeev | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

DOC2VEC OR BETTER INTERPRETABILITY? A METHOD STUDY FOR AUTHORSHIP ATTRIBUTION

Author(s) -

Elena Vladimirovna Pimonova,

Oleg Durandin,

Alexey Malafeev

Publication year - 2020

Publication title -

kompʹûternaâ lingvistika i intellektualʹnye tehnologii

Language(s) - English

Resource type - Conference proceedings

ISSN - 2075-7182

DOI - 10.28995/2075-7182-2020-19-606-614

Subject(s) - computer science , interpretability , natural language processing , authorship attribution , artificial intelligence , representation (politics) , feature (linguistics) , style (visual arts) , code (set theory) , set (abstract data type) , linguistics , programming language , history , philosophy , archaeology , politics , political science , law

In this work, we perform a method study for the problem of authorship attribution in Russian and English. The datasets used consist of 324 works written in Russian and 207 works in English. We propose a set of text representation models that reflect various linguistic phenomena, in particular, morphological and syntactic ones. One distinctive feature of the proposed models is that they are interpretable. These models are used individually and in combination against a Doc2Vec baseline. For Russian, some of our models outperform Doc2Vec, but this does not happen in the case of English, for various reasons. However, the proposed models can also be used together with Doc2Vec, dramatically improving its performance: by 16.79% in the case of Russian and by 7.2% for English. Additionally, we experiment with two different methods for separating texts into blocks of K sentences (contiguous and bootstrapped) and performed parameter tuning of K. Finally, we conduct a feature importance analysis and show which linguistic markers of author style are the most pertinent for Russian, English and for both these languages. All code used in this work is made freely available to the community.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research