Finding characteristic features in stylometric analysis
Author(s) -
Carmen Klaussner,
John Nerbonne,
Çağrı Çöltekin
Publication year - 2015
Publication title -
digital scholarship in the humanities
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.4
H-Index - 15
eISSN - 2055-768X
pISSN - 2055-7671
DOI - 10.1093/llc/fqv048
Subject(s) - focus (optics) , authorship attribution , set (abstract data type) , stylometry , similarity (geometry) , computer science , task (project management) , feature (linguistics) , cluster analysis , selection (genetic algorithm) , artificial intelligence , information retrieval , natural language processing , pattern recognition (psychology) , linguistics , image (mathematics) , engineering , philosophy , physics , systems engineering , optics , programming language
The usual focus in authorship studies is on authorship attribution, i.e. determin- ing which author (of a given set) wrote a piece of unknown provenance. The usual setting involves a small number of candidate authors, which means that the focus quickly revolves around a search for features that discriminate among the candidates. Whether the features that serve to discriminate among the authors are characteristic is then not of primary importance. We respectfully suggest an alternative in this article, namely a focus on seeking features that are character- istic for an author with respect to others. To determine an author's characteristic features, we first seek elements that he or she uses consistently, which we there- fore regard as 'representative', but we likewise seek elements which the author uses 'distinctively' in comparison to an opposing author. We test the idea on a task recently proposed that compares Charles Dickens to both Wilkie Collins and a larger reference set comprising several authors' works from the 18th and 19th century. We then compare the use of representative and distinctive features to Burrows' 'Delta' and Hoovers' 'CoV Tuning'; we find that our method bears little similarity with either method in terms of characteristic feature selection. We show that our method achieves reliable and consistent results in the two- author comparison and fair results in the multi-author one, measured by separ- ation ability in clustering.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom