DILAF: A framework for distributed analysis of large‐scale system logs for anomaly detection | Zendy

Astekin Merve | Zendy; Zengin Harun | Zendy; Sözer Hasan | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Premium

DILAF: A framework for distributed analysis of large‐scale system logs for anomaly detection

Author(s) -

Astekin Merve,

Zengin Harun,

Sözer Hasan

Publication year - 2019

Publication title -

software: practice and experience

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.437

H-Index - 70

eISSN - 1097-024X

pISSN - 0038-0644

DOI - 10.1002/spe.2653

Subject(s) - computer science , scalability , anomaly detection , context (archaeology) , data mining , parsing , distributed computing , scale (ratio) , source code , system monitoring , anomaly (physics) , artificial intelligence , database , programming language , operating system , biology , condensed matter physics , paleontology , physics , quantum mechanics

Summary System logs constitute a rich source of information for detection and prediction of anomalies. However, they can include a huge volume of data, which is usually unstructured or semistructured. We introduce DILAF, a framework for distributed analysis of large‐scale system logs for anomaly detection. DILAF is comprised of several processes to facilitate log parsing, feature extraction, and machine learning activities. It has two distinguishing features with respect to the existing tools. First, it does not require the availability of source code of the analyzed system. Second, it is designed to perform all the processes in a distributed manner to support scalable analysis in the context of large‐scale distributed systems. We discuss the software architecture of DILAF and we introduce an implementation of it. We conducted controlled experiments based on two datasets to evaluate the effectiveness of the framework. In particular, we evaluated the performance and scalability attributes under various degrees of parallelism. Results showed that DILAF can maintain the same accuracy levels while achieving more than 30% performance improvement on average as the system scales, compared to baseline approaches that do not employ fully distributed processing.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here

Accelerating Research