z-logo
open-access-imgOpen Access
SELECTION OF METRIC AND CATEGORICAL ATTRIBUTES OF RARE ANOMALOUS EVENTS IN A COMPUTER SYSTEM USING DATA MINING METHODS
Author(s) -
Oleg I. Sheluhin,
Dmitry I. Rakovsky
Publication year - 2021
Publication title -
t-comm
Language(s) - English
Resource type - Journals
eISSN - 2072-8743
pISSN - 2072-8735
DOI - 10.36724/2072-8735-2021-15-6-40-47
Subject(s) - cluster analysis , data mining , outlier , pattern recognition (psychology) , categorical variable , computer science , preprocessor , anomaly detection , data pre processing , normalization (sociology) , principal component analysis , artificial intelligence , mathematics , statistics , sociology , anthropology
The process of marking multi-attribute experimental data for subsequent use by means of data mining in problems of detection and classification of rare anomalous events of computer systems (CS) is considered. The labeling process is carried out using three methods: manual preprocessing, statistical analysis and cluster analysis. Among the attributes of the metric type, the authors identified two macrogroups: “integral attributes” and “impulse attributes”. It is shown that the combination of statistical and cluster analysis methods increases the accuracy of detecting anomalous events in the CS, and also allows the selection of attributes according to their information significance. The expediency of manual preprocessing of data before clustering is shown by the example of dividing attributes into macrogroups, analyzing the density distribution using violin plot and removing the trend component using the method difference stationary series. With the help of construction of violin diagrams (Violin plot) for the attribute of the “integral” macrogroup, the distribution of states of the CS is shown. It is shown that the removal of the trend component by the DS-series method, normalization and reduction to absolute values allows more accurate marking of anomalous outliers, but this is not always acceptable. The interpretation of the clustering results performed for each normalized attribute shows that the normal values for all attributes are concentrated around zero values. The result of labeling experimental data is attribute-labeled data, where each attribute at the current time is assigned one of two states: abnormal or normal.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here