Text Analysis and Machine Learning Approach to Phished Email Detection
Author(s) -
Olasehinde Olayemi
Publication year - 2019
Publication title -
international journal of computer applications
Language(s) - English
Resource type - Journals
ISSN - 0975-8887
DOI - 10.5120/ijca2019918354
Subject(s) - computer science , artificial intelligence , machine learning , natural language processing , information retrieval
Phishing;, an identity theft of sensitive information poses a serious challenge to security of personal information, it has worrisome effect on countless number of internet users bringing about a huge financial demand on business and victims alike. Text mining is a branch of Data mining used in analyzing large volume of unstructured text data in order to extract meaningful information from it, Machine learning (ML) is an aspect of artificial Intelligence (AI) that uses the method of data mining to find out new or existing characteristics from a set of gathered data which can be relevant for classification. Machine learning methods has been found to achieve much better result than other phished email detection techniques such as blacklists, visual similarity and heuristic techniques. In this work, text mining of phished and ham emails were carried out, three machine learning techniques:Naive Bayes, K-Nearest Neighbor and Support Vector Machine were used in identifying phished email on a standard analyzed phished email and Ham corpora. From the result, Naive bayes was found to have highest classification accuracy of 99.0% as against the other two machine learning techniques SVM (98.6%) and KNN (96.9%).
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom