A Malicious URL Detection Model Based on Convolutional Neural Network
Author(s) -
Zhiqiang Wang,
Xiaorui Ren,
Shuhao Li,
Bingyan Wang,
Jianyi Zhang,
Tao Yang
Publication year - 2021
Publication title -
security and communication networks
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.446
H-Index - 43
eISSN - 1939-0114
pISSN - 1939-0122
DOI - 10.1155/2021/5518528
Subject(s) - computer science , embedding , pooling , word embedding , convolutional neural network , feature (linguistics) , layer (electronics) , convolution (computer science) , artificial intelligence , data mining , pattern recognition (psychology) , artificial neural network , theoretical computer science , philosophy , linguistics , chemistry , organic chemistry
With the development of Internet technology, network security is under diverse threats. In particular, attackers can spread malicious uniform resource locators (URL) to carry out attacks such as phishing and spam. The research on malicious URL detection is significant for defending against these attacks. However, there are still some problems in the current research. For instance, malicious features cannot be extracted efficiently. Some existing detection methods are easy to evade by attackers. We design a malicious URL detection model based on a dynamic convolutional neural network (DCNN) to solve these problems. A new folding layer is added to the original multilayer convolution network. It replaces the pooling layer with the k-max-pooling layer. In the dynamic convolution algorithm, the width of feature mapping in the middle layer depends on the vector input dimension. Moreover, the pooling layer parameters are dynamically adjusted according to the length of the URL input and the depth of the current convolution layer, which is beneficial to extracting more in-depth features in a wider range. In this paper, we propose a new embedding method in which word embedding based on character embedding is leveraged to learn the vector representation of a URL. Meanwhile, we conduct two groups of comparative experiments. First, we conduct three contrast experiments, which adopt the same network structure and different embedding methods. The results prove that word embedding based on character embedding can achieve higher accuracy. We then conduct the other three experiences, which use the same embedding method proposed in this paper and use different network structures to determine which network is most suitable for our model. We verify that the model designed in this paper has the highest accuracy (98%) in detecting malicious URL through these experiences.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom