z-logo
open-access-imgOpen Access
Exploring the impact of similarity index on the accuracy of a phishing site detection model using machine learning
Author(s) -
Ondrej Danko,
Tatiana A. Medvedeva
Publication year - 2021
Publication title -
journal of physics. conference series
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.21
H-Index - 85
eISSN - 1742-6596
pISSN - 1742-6588
DOI - 10.1088/1742-6596/2131/2/022076
Subject(s) - phishing , computer science , similarity (geometry) , artificial intelligence , index (typography) , machine learning , domain (mathematical analysis) , binary classification , data mining , binary number , natural language processing , support vector machine , world wide web , the internet , mathematics , mathematical analysis , arithmetic , image (mathematics)
In this paper, the problem of phishing site detection using machine learning is discussed. The main goal is to study the effectiveness of various binary classification models when extracting only lexical features from a URL. Special attention has been given to the analysis of features obtained from the domain by calculating the similarity index using the whitelist. After training and testing the models, accuracy metrics were calculated and the results were compared. The lexical features that have the greatest weight for the classification of URLs are highlighted, and the advantages and disadvantages of this approach are described.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here