Predicting Alert Source Device using Machine Learning Algorithms | Zendy

Bharath M. B. | Zendy; D. V. Ashoka | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Predicting Alert Source Device using Machine Learning Algorithms

Author(s) -

Bharath M. B.,

D. V. Ashoka

Publication year - 2020

Publication title -

international journal of innovative technology and exploring engineering

Language(s) - English

Resource type - Journals

ISSN - 2278-3075

DOI - 10.35940/ijitee.d1526.079920

Subject(s) - computer science , machine learning , word2vec , artificial intelligence , support vector machine , set (abstract data type) , sentence , artificial neural network , binary classification , task (project management) , data mining , management , embedding , economics , programming language

In a large distributed virtualized environment, predicting the alerting source from its text seems to be daunting task. This paper explores the option of using machine learning algorithm to solve this problem. Unfortunately, our training dataset is highly imbalanced. Where 96% of alerting data is reported by 24% of alerting sources. This is the expected dataset in any live distributed virtualized environment, where new version of device will have relatively less alert compared to older devices. Any classification effort with such imbalanced dataset present different set of challenges compared to binary classification. This type of skewed data distribution makes conventional machine learning less effective, especially while predicting the minority device type alerts. Our challenge is to build a robust model which can cope with this imbalanced dataset and achieves relative high level of prediction accuracy. This research work stared with traditional regression and classification algorithms using bag of words model. Then word2vec and doc2vec models are used to represent the words in vector formats, which preserve the sematic meaning of the sentence. With this alerting text with similar message will have same vector form representation. This vectorized alerting text is used with Logistic Regression for model building. This yields better accuracy, but the model is relatively complex and demand more computational resources. Finally, simple neural network is used for this multi-class text classification problem domain by using keras and tensorflow libraries. A simple two layered neural network yielded 99 % accuracy, even though our training dataset was not balanced. This paper goes through the qualitative evaluation of the different machine learning algorithms and their respective result. Finally, two layered deep learning algorithms is selected as final solution, since it takes relatively less resource and time with better accuracy values.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research