
DatApollo: Orchestration of Serverless Functions for Scalable Data Mining
Author(s) -
Mahtab Shahin,
Markus Bertl,
Nasim Janatian,
Juan Aznar-Poveda,
Syed Attique Shah,
Thomas Fahringer,
Sijo Arakkal Peious,
Dirk Draheim
Publication year - 2025
Publication title -
ieee access
Language(s) - English
Resource type - Magazines
SCImago Journal Rank - 0.587
H-Index - 127
eISSN - 2169-3536
DOI - 10.1109/access.2025.3591712
Subject(s) - aerospace , bioengineering , communication, networking and broadcast technologies , components, circuits, devices and systems , computing and processing , engineered materials, dielectrics and plasmas , engineering profession , fields, waves and electromagnetics , general topics for engineers , geoscience , nuclear engineering , photonics and electrooptics , power, energy and industry applications , robotics and control systems , signal processing and analysis , transportation
The exponential growth of data generated from enterprise systems, social networks, and Internet of Things (IoT) devices presents major scalability and efficiency challenges for traditional data mining techniques. Among these, Association Rule Mining (ARM) – a foundational unsupervised learning method for discovering frequent patterns in transactional datasets – frequently encounters performance bottlenecks in distributed environments due to excessive memory consumption, static resource provisioning, and costly data shuffling. This paper introduces DatApollo , a novel serverless orchestration framework designed to support scalable and efficient execution of distributed ARM workflows. Built on the Apollo orchestration engine, DatApollo employs stateless cloud functions with dynamic scheduling, intermediate state persistence, and fine-grained fault-tolerant coordination to address the limitations of both traditional cluster-based architectures and existing Function-as-a-Service (FaaS) models. The framework decomposes Apriori-style ARM pipelines into orchestrated micro-functions, enabling elastic, cloud-native execution with minimal idle overhead. We detail the architectural design, algorithmic components, and computational complexity of DatApollo, followed by a thorough experimental evaluation using real-world healthcare and meteorological datasets. Compared to Apache Spark, DatApollo demonstrates up to a 5-times speedup in execution time and significantly lowers infrastructure costs by leveraging elastic scaling and event-driven function invocation. These results position DatApollo as a robust, cost-effective, and high-performance alternative for ARM in dynamic, large-scale data environments.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom