Estimating the Sample Size for Training Intrusion Detection Systems
Author(s) -
Yasmen Wahba,
Ehab ElSalamouny,
Ghada Eltaweel
Publication year - 2017
Publication title -
international journal of computer network and information security
Language(s) - English
Resource type - Journals
eISSN - 2074-9104
pISSN - 2074-9090
DOI - 10.5815/ijcnis.2017.12.01
Subject(s) - computer science , intrusion detection system , artificial intelligence , machine learning , classifier (uml) , naive bayes classifier , sample size determination , data mining , sample (material) , context (archaeology) , feature selection , pattern recognition (psychology) , statistics , support vector machine , mathematics , chemistry , biology , chromatography , paleontology
Intrusion detection systems (IDS) are gaining attention as network technologies are vastly growing. Most of the research in this field focuses on improving the performance of these systems through various feature selection techniques along with using ensembles of classifiers. An orthogonal problem is to estimate the proper sample sizes to train those classifiers. While this problem has been considered in other disciplines, mainly medical and biological, to study the relation between the sample size and the classifiers accuracy, it has not received a similar attention in the context of intrusion detection as far as we know. In this paper we focus on systems based on Naïve Bayes classifiers and investigate the effect of the training sample size on the classification performance for the imbalanced NSL-KDD intrusion dataset. In order to estimate the appropriate sample size required to achieve a required classification performance, we constructed the learning curve of the classifier for individual classes in the dataset. For this construction we performed nonlinear least squares curve fitting using two different power law models. Results showed that while the shifted power law outperforms the power law model in terms of fitting performance, it exhibited a poor prediction performance. The power law, on the other hand, showed a significantly better prediction performance for larger sample sizes.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom