SELECTION OF METRIC AND CATEGORICAL ATTRIBUTES OF RARE ANOMALOUS EVENTS IN A COMPUTER SYSTEM USING DATA MINING METHODS | Zendy

Oleg I. Sheluhin | Zendy; Dmitry I. Rakovsky | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

SELECTION OF METRIC AND CATEGORICAL ATTRIBUTES OF RARE ANOMALOUS EVENTS IN A COMPUTER SYSTEM USING DATA MINING METHODS

Author(s) -

Oleg I. Sheluhin,

Dmitry I. Rakovsky

Publication year - 2021

Publication title -

t-comm

Language(s) - English

Resource type - Journals

eISSN - 2072-8743

pISSN - 2072-8735

DOI - 10.36724/2072-8735-2021-15-6-40-47

Subject(s) - cluster analysis , data mining , outlier , pattern recognition (psychology) , categorical variable , computer science , preprocessor , anomaly detection , data pre processing , normalization (sociology) , principal component analysis , artificial intelligence , mathematics , statistics , sociology , anthropology

The process of marking multi-attribute experimental data for subsequent use by means of data mining in problems of detection and classification of rare anomalous events of computer systems (CS) is considered. The labeling process is carried out using three methods: manual preprocessing, statistical analysis and cluster analysis. Among the attributes of the metric type, the authors identified two macrogroups: “integral attributes” and “impulse attributes”. It is shown that the combination of statistical and cluster analysis methods increases the accuracy of detecting anomalous events in the CS, and also allows the selection of attributes according to their information significance. The expediency of manual preprocessing of data before clustering is shown by the example of dividing attributes into macrogroups, analyzing the density distribution using violin plot and removing the trend component using the method difference stationary series. With the help of construction of violin diagrams (Violin plot) for the attribute of the “integral” macrogroup, the distribution of states of the CS is shown. It is shown that the removal of the trend component by the DS-series method, normalization and reduction to absolute values allows more accurate marking of anomalous outliers, but this is not always acceptable. The interpretation of the clustering results performed for each normalized attribute shows that the normal values for all attributes are concentrated around zero values. The result of labeling experimental data is attribute-labeled data, where each attribute at the current time is assigned one of two states: abnormal or normal.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Empowering knowledge with every search

About

About Careers Publisher Partners Contact Us

Learn

FAQs Blog Terms of Use Privacy Policy

About

Learn

Discover

Explore