Open Access
Author Profiling u sing Stylistic a nd N Gram Features
Author(s) -
Radha D*,
Chandra Sekhar P
Publication year - 2019
Publication title -
international journal of engineering and advanced technology
Language(s) - English
Resource type - Journals
ISSN - 2249-8958
DOI - 10.35940/ijeat.a1621.109119
Subject(s) - stylometry , profiling (computer programming) , computer science , natural language processing , artificial intelligence , social media , upload , naive bayes classifier , information retrieval , world wide web , support vector machine , operating system
The World Wide Web is increasing tremendously with massive amount of textual content primarily through social media sites. Most of the users are not interested to upload their genuine details along with textual content to these sites. To identify the correct information of the authors the researchers started a new research area named as Authorship Analysis. The authorship Analysis is used to find the details of the authors by examining their text. Authorship Profiling is one type of Authorship Analysis, which is used to detect the demographic characteristics like Age, Gender, Location, Educational Background, Nativity Language and Personality Traits of the authors by examining writing skills in their written text. Stylometry is one research area defines a set of stylometric features namely word based, character based, syntactic, structural and content based features for differentiating the author’s writing styles. In this work, the experimentation conducted with various stylistic features, N-grams and content based features for gender prediction. These features are used for representing the vectors of documents. The classification algorithms produce the model by processing these vectors. Two classification algorithms namely Random Forest, Naïve Bayes Multinomial were used for classification. We concentrated on prediction of Gender from 2019 Pan Competition Twitter dataset. Our approach obtained best accuracies when compared with many Authorship Profiling approaches.