Premium
An efficient list scheduling algorithm with task duplication for scientific big data workflow in heterogeneous computing environments
Author(s) -
Ahmad Wakar,
Alam Bashir
Publication year - 2020
Publication title -
concurrency and computation: practice and experience
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.309
H-Index - 67
eISSN - 1532-0634
pISSN - 1532-0626
DOI - 10.1002/cpe.5987
Subject(s) - computer science , workflow , distributed computing , scheduling (production processes) , parallel computing , big data , dynamic priority scheduling , job shop scheduling , database , data mining , embedded system , computer network , mathematical optimization , mathematics , routing (electronic design automation) , quality of service
Summary A high‐performance heterogeneous computing environment provides computing resources for efficient computation of scientific Big Data workflow applications. A Big Data workflow application comprises thousands of interdependent tasks with precedence constraints. Basically, the performance of heterogeneous computing systems mainly depends on the workflow scheduling algorithms. These workflow scheduling algorithms are considered an NP‐complete problem. In this article, a List Scheduling with Task Duplication (LSTD) algorithm is proposed that efficiently minimizes the makespan of workflow applications. The LSTD introduces task duplication strategy in the list scheduling algorithm without increasing the overall time complexity. The overall functionality of LSTD mainly consists of three phases. In the first phase, it calculates the rank of the tasks for deciding the scheduling order. The next step is responsible for duplicating the entry task on the processor only if it increases the overall efficiency and avoids processor overloading. Finally, in the last step, the processor is assigned to the tasks based on the popular insertion‐based policy that attempts to insert the task among two earlier assigned tasks on a given processor in earliest idle time. In order to verify the usefulness of the proposed algorithm, several existing well‐known algorithms, such as Heterogeneous Earliest Finish Time (HEFT), Critical Path on a Processor (CPOP), and Predict Earliest Finish Time (PEFT) are considered for comparison. A non‐duplication version of LSTD named List Scheduling without Task Duplication algorithm is also considered for performance evaluation. The experimental analysis based on scientific Big Data workflows (CyberShake, Montage, and LIGO) proves that LSTD significantly surpasses all considered scheduling heuristics concerning schedule length ratio, the percentage of best results, and average running time metrics.