Premium
Estimating the probability of an authorship attribution
Author(s) -
Savoy Jacques
Publication year - 2016
Publication title -
journal of the association for information science and technology
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.903
H-Index - 145
eISSN - 2330-1643
pISSN - 2330-1635
DOI - 10.1002/asi.23455
Subject(s) - computer science , divergence (linguistics) , authorship attribution , certainty , rank (graph theory) , attribution , bayes' theorem , artificial intelligence , information retrieval , statistics , natural language processing , bayesian probability , mathematics , linguistics , psychology , epistemology , social psychology , philosophy , combinatorics
In authorship attribution, various distance‐based metrics have been proposed to determine the most probable author of a disputed text. In this paradigm, a distance is computed between each author profile and the query text. These values are then employed only to rank the possible authors. In this article, we analyze their distribution and show that we can model it as a mixture of 2 B eta distributions. Based on this finding, we demonstrate how we can derive a more accurate probability that the closest author is, in fact, the real author. To evaluate this approach, we have chosen 4 authorship attribution methods ( B urrows' D elta, K ullback‐ L eibler divergence, L abbé's intertextual distance, and the naïve B ayes). As the first test collection, we have downloaded 224 S tate of the U nion addresses (from 1790 to 2014) delivered by 41 U.S. presidents. The second test collection is formed by the F ederalist P apers . The evaluations indicate that the accuracy rate of some authorship decisions can be improved. The suggested method can signal that the proposed assignment should be interpreted as possible, without strong certainty. Being able to quantify the certainty associated with an authorship decision can be a useful component when important decisions must be taken.