
Dual Stage Network Intrusion Detection System Through Feature Reduction
Author(s) -
Raghuvansh Raj,
Somya Gupta,
Manan Lohia,
H. C. Taneja
Publication year - 2020
Publication title -
journal of physics. conference series
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.21
H-Index - 85
eISSN - 1742-6596
pISSN - 1742-6588
DOI - 10.1088/1742-6596/1564/1/012033
Subject(s) - computer science , intrusion detection system , categorical variable , denial of service attack , data mining , false positive paradox , anomaly detection , network security , artificial intelligence , feature (linguistics) , apriori algorithm , association rule learning , machine learning , the internet , computer security , linguistics , philosophy , world wide web
With an exponential increase in the amount of data produced, transmitted, stored and exchanged over the internet, intrusion detection systems have formed an integral part of modern-day network security systems. Considerable expense, time and efforts are spent in ensuring timely detection and denial of malicious users in order to preserve the key objectives of system security; confidentiality, integrity, and availability. In this paper, we intend to propose a dual stage algorithm to tackle the problem of NIDS. Our aim in this paper is to construct an algorithm that results in few false positives and fewer false negatives, as any IDS should be. Research into network intrusion detection systems dates back to the early 1990s where researchers initially developed rule-based algorithms such as SNORT and TCPDUMP. As the subject gained traction and importance, researchers began to shift efforts towards creating anomaly detection systems using benchmarked datasets to test their algorithms. In the early 2010s, several papers pertaining to NIDS were published, and a majority of the technological breakthroughs in this field were influenced by the theory of deep learning. We have focused on building an IDS using the benchmarked NSL-KDD dataset. The dataset consists of 41 features, of which 3 are categorical and the remaining, numeric. Having opted for One-Hot Encoding of the categorical features, our feature space explodes to a sum of 122 features. Despite the obvious drawbacks, this was necessary as the categorical features do not contain any implicit ordering within their values. The aim is to develop an efficiently compressed feature space with the ultimate goal being to develop a computationally light classification model capable of operating on the compressed feature set using sequential models to our benefit.