
An Analysis of Multiclass Imbalanced Data Problem in Machine Learning for Network Attack Detections
Author(s) -
Hui Fern Soon,
Amiza Amir,
Saidatul Norlyana Azemi
Publication year - 2021
Publication title -
journal of physics. conference series
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.21
H-Index - 85
eISSN - 1742-6596
pISSN - 1742-6588
DOI - 10.1088/1742-6596/1755/1/012030
Subject(s) - c4.5 algorithm , machine learning , artificial intelligence , computer science , naive bayes classifier , classifier (uml) , multiclass classification , false positive rate , class (philosophy) , pattern recognition (psychology) , support vector machine
In the current trend, machine learning has been used widely for network attack detection. The performance of a machine learning model depends on its training dataset and the dataset distribution. Network attack detection is one of the problems that usually suffer from the imbalanced data distribution. However, the effect of this imbalanced data is generally neglected by researchers. Therefore, in this research, we studied the impact of an imbalanced multiclass dataset to the machine learning performance. Five state-of-the-art machine learning algorithms were used in this study, and the classifiers that can classify the minority class or the majority class instances accurately were also identified. In this research, the performances of these classifiers were evaluated by using the performance metrics: true positive rate, false positive rate, precision, F-measure, ROC area, and classification accuracy. The results show that the J48 classifier outperforms the other four classifiers in every aspect. Besides, Naïve Bayes, J48, has also worked as the best classifier that able to classifies the instances of the minority class accurately.