z-logo
open-access-imgOpen Access
Analysis of the influence of Machine Learning algorithm parameters on the results of traffic classification in real time
Author(s) -
Irina А. Krasnova,
Mtuci
Publication year - 2021
Publication title -
t-comm
Language(s) - English
Resource type - Journals
eISSN - 2072-8743
pISSN - 2072-8735
DOI - 10.36724/2072-8735-2021-15-9-24-35
Subject(s) - overfitting , random forest , computer science , partition (number theory) , artificial intelligence , set (abstract data type) , tree (set theory) , algorithm , network packet , node (physics) , data mining , machine learning , mathematics , artificial neural network , engineering , mathematical analysis , computer network , structural engineering , combinatorics , programming language
The paper analyzes the impact of setting the parameters of Machine Learning algorithms on the results of traffic classification in real-time. The Random Forest and XGBoost algorithms are considered. A brief description of the work of both methods and methods for evaluating the results of classification is given. Experimental studies are conducted on a database obtained on a real network, separately for TCP and UDP flows. In order for the results of the study to be used in real time, a special feature matrix is created based on the first 15 packets of the flow. The main parameters of the Random Forest (RF) algorithm for configuration are the number of trees, the partition criterion used, the maximum number of features for constructing the partition function, the depth of the tree, and the minimum number of samples in the node and in the leaf. For XGBoost, the number of trees, the depth of the tree, the minimum number of samples in the leaf, for features, and the percentage of samples needed to build the tree are taken. Increasing the number of trees leads to an increase in accuracy to a certain value, but as shown in the article, it is important to make sure that the model is not overfitted. To combat overfitting, the remaining parameters of the trees are used. In the data set under study, by eliminating overfitting, it was possible to achieve an increase in classification accuracy for individual applications by 11-12% for Random Forest and by 12-19% for XGBoost. The results show that setting the parameters is a very important step in building a traffic classification model, because it helps to combat overfitting and significantly increases the accuracy of the algorithm’s predictions. In addition, it was shown that if the parameters are properly configured, XGBoost, which is not very popular in traffic classification works, becomes a competitive algorithm and shows better results compared to the widespread Random Forest.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here