DATA ANALYSIS PLATFORM FOR STREAM AND BATCH DATA PROCESSING ON HYBRID COMPUTING RESOURCES | Zendy

Sergey Belov | Zendy; Ivan Kadochnikov | Zendy; V. Korenkov | Zendy; Andrey Reshetnikov | Zendy; R. Semenov | Zendy; P. Zrelov | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

DATA ANALYSIS PLATFORM FOR STREAM AND BATCH DATA PROCESSING ON HYBRID COMPUTING RESOURCES

Author(s) -

Sergey Belov,

Ivan Kadochnikov,

V. Korenkov,

Andrey Reshetnikov,

R. Semenov,

P. Zrelov

Publication year - 2021

Publication title -

9th international conference "distributed computing and grid technologies in science and education"

Language(s) - English

Resource type - Conference proceedings

DOI - 10.54546/mlit.2021.31.67.001

Subject(s) - computer science , big data , spark (programming language) , provisioning , distributed computing , stream processing , data stream mining , software , cloud computing , anomaly detection , data science , database , operating system , data mining , programming language

The modern Big Data ecosystem provides tools to build a flexible platform for processing data streams and batch datasets. Supporting both the functioning of modern giant particle physics experiments and the services necessary for the work of many individual physics researchers results in generating and transferring large amounts of semi-structured data. Thus, it is promising to apply cutting-edge technologies to study these data flows and make the services' provisioning more effective. In this work, we describe the structure and implementation of our data analysis platform, built on the Apache Spark cluster. With the official support for GPU computing now available in Spark version 3, we propose a change in the architecture to utilize these more performant resources while keeping the platform's functionality provided by using mainstream Big Data software. Furthermore, the necessity for GPU support entails a change in the computing resource management infrastructure from Apache Mesos to Kubernetes. Finally, to demonstrate the features and operation of the system, we use the task of network packet analysis for security monitoring and anomaly detection in both batch and streammodes.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research