Premium
Decentralized executions of privacy awareness data analytics workflows in the cloud
Author(s) -
Yao Yan,
Cao Jian,
Qian Shiyou,
Feng Shanshan
Publication year - 2018
Publication title -
concurrency and computation: practice and experience
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.309
H-Index - 67
eISSN - 1532-0634
pISSN - 1532-0626
DOI - 10.1002/cpe.5063
Subject(s) - cloud computing , computer science , workflow , analytics , workflow engine , workflow management system , software deployment , distributed computing , workflow technology , database , operating system
Summary Nowadays, with the development of cloud computing technology, an increasing number of enterprises or organizations have migrated applications to the cloud environment. Because of privacy concerns, a company may store sensitive data on a local server or a private cloud. As a result, the data analytics tasks have to be performed only in the local environment even when the analysis results will be shared to the outside. In this paper, we design a decentralized workflow system to speed up the execution of a Privacy‐Awareness Data Analytics application (PADA). Specifically, a workflow system with actor layer and engine layer for distributed execution of PADA proposed. The actors are responsible for performing actual data analysis tasks on the data side while the engines are deployed into different regions of the public cloud and communicate with actors to execute the whole workflow. Since engines in different regions have different communication latencies when invoking an actor, the optimization engine deployment scheme should be found to minimize the total execution time of a PADA. A path search‐based heuristic algorithm is designed to select suitable cloud regions to place the engines in the public cloud. The experimental results indicate that the proposed algorithm is effective in reducing application makespan.