z-logo
Premium
Adaptive real‐time anomaly detection in cloud infrastructures
Author(s) -
Agrawal Bikash,
Wiktorski Tomasz,
Rong Chunming
Publication year - 2017
Publication title -
concurrency and computation: practice and experience
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.309
H-Index - 67
eISSN - 1532-0634
pISSN - 1532-0626
DOI - 10.1002/cpe.4193
Subject(s) - cloud computing , computer science , anomaly detection , scalability , distributed computing , spark (programming language) , data mining , anomaly (physics) , principal component analysis , real time computing , artificial intelligence , database , operating system , physics , programming language , condensed matter physics
Summary Cloud computing has become increasingly popular, which has led many individuals and organizations towards cloud storage systems. This move is motivated by benefits such as shared storage, computation, and transparent service among a massive number of users. However, cloud‐computing systems require the maintenance of complex and large‐scale systems with practically unavoidable runtime problems caused by hardware and software faults. Large systems are very complex due to heterogeneity, dynamicity, scalability, hidden complexity, and time limitations. Automatic anomaly detection is a critical technique for managing such complex cloud resources. This paper proposes a scalable model for automatic anomaly detection on a large system like a cloud. The anomaly detection process is capable of issuing a correct early warning of unusual behavior in dynamic environments after learning the system characteristic of normal operation. To detect unusual activity in the cloud, we need to monitor the data center and collect cloud performance logs. In this paper, we propose an adaptive anomaly detection mechanism, which investigates principal components of the performance metrics. It transforms the performance metrics into a low‐rank matrix and calculates the orthogonal distance using the Robust PCA algorithm. The proposed model updates itself recursively, while learning and adjusting the new threshold value, to minimize reconstruction errors. This paper also investigates robust principal component analysis in distributed environments using Apache Spark as the underlying framework. It specifically addresses cases in which normal operation might exhibit multiple hidden modes. The accuracy and sensitivity of the model were tested on Amazon CloudWatch datasets, and Yahoo! datasets. The model achieved an accuracy of 88.54 % .

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here