z-logo
open-access-imgOpen Access
Analysis of authorship attribution technique on Urdu tweets empowered by machine learning
Publication year - 2021
Publication title -
international journal of advanced trends in computer science and engineering
Language(s) - English
Resource type - Journals
ISSN - 2278-3091
DOI - 10.30534/ijatcse/2021/911032021
Subject(s) - stylometry , urdu , authorship attribution , computer science , phishing , similarity (geometry) , artificial intelligence , instant messaging , natural language processing , task (project management) , cosine similarity , identity (music) , spoofing attack , world wide web , linguistics , the internet , computer security , cluster analysis , philosophy , physics , management , acoustics , economics , image (mathematics)
Theprocess of identifying the author of an anonymous document from a group of unknown documents is called authorship attribution. As the world is trending towards shorter communications, the trend of online criminal activities like phishing and bullying are also increasing. The criminal hides their identity behind the screen name and connects anonymously. Which generates difficulty while tracing criminals during the cybercrime investigation process. This paper evaluates current techniques of authorship attribution at the linguistic level and compares the accuracy rate in terms of English and Urdu context, by using the LDA model with n-gram technique and cosine similarity, used to work on Stylometry features to identify the writing style of a specific author. Two datasets are used Urdu_TD and English_TD based on 180 English and Urdu tweets against each author. The overall accuracy that we achieved from Urdu_TD is 84.52% accuracy and 93.17% accuracy on English_TD. The task is done without using any labels for authorship

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here