Premium
A comparison of classifiers and features for authorship authentication of social networking messages
Author(s) -
Li Jenny S.,
Chen LiChiou,
Monaco John V.,
Singh Pranjal,
Tappert Charles C.
Publication year - 2016
Publication title -
concurrency and computation: practice and experience
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.309
H-Index - 67
eISSN - 1532-0634
pISSN - 1532-0626
DOI - 10.1002/cpe.3918
Subject(s) - stylometry , computer science , categorization , authentication (law) , machine learning , voting , variety (cybernetics) , artificial intelligence , social network (sociolinguistics) , empirical research , feature selection , feature (linguistics) , social media , anonymity , information retrieval , world wide web , computer security , mathematics , statistics , linguistics , philosophy , politics , political science , law
Summary This paper develops algorithms and investigates various classifiers to determine the authenticity of short social network postings, an average of 20.6 words, from Facebook. This paper presents and discusses several experiments using a variety of classifiers. The goal of this research is to determine the degree to which such postings can be authenticated as coming from the purported user and not from an intruder. Various sets of stylometry and ad hoc social networking features were developed to categorize 9259 posts from 30 Facebook authors as authentic or non‐authentic. An algorithm to utilize machine‐learning classifiers for investigating this problem is described, and an additional voting algorithm that combines three classifiers is investigated. This research is one of the first works that focused on authorship authentication in short messages, such as postings on social network sites. The challenges of applying traditional stylometry techniques on short messages are discussed. Experimental results demonstrate an average accuracy rate of 79.6% among 30 users. Further empirical analyses evaluate the effect of sample size, feature selection, user writing style, and classification method on authorship authentication, indicating varying degrees of success compared with previous studies. Copyright © 2016 John Wiley & Sons, Ltd.