An efficient list scheduling algorithm with task duplication for scientific big data workflow in heterogeneous computing environments | Zendy

Ahmad Wakar | Zendy; Alam Bashir | Zendy

Premium

An efficient list scheduling algorithm with task duplication for scientific big data workflow in heterogeneous computing environments

Author(s) -

Ahmad Wakar,

Alam Bashir

Publication year - 2020

Publication title -

concurrency and computation: practice and experience

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.309

H-Index - 67

eISSN - 1532-0634

pISSN - 1532-0626

DOI - 10.1002/cpe.5987

Subject(s) - computer science , workflow , distributed computing , scheduling (production processes) , parallel computing , big data , dynamic priority scheduling , job shop scheduling , database , data mining , embedded system , computer network , mathematical optimization , mathematics , routing (electronic design automation) , quality of service

Summary A high‐performance heterogeneous computing environment provides computing resources for efficient computation of scientific Big Data workflow applications. A Big Data workflow application comprises thousands of interdependent tasks with precedence constraints. Basically, the performance of heterogeneous computing systems mainly depends on the workflow scheduling algorithms. These workflow scheduling algorithms are considered an NP‐complete problem. In this article, a List Scheduling with Task Duplication (LSTD) algorithm is proposed that efficiently minimizes the makespan of workflow applications. The LSTD introduces task duplication strategy in the list scheduling algorithm without increasing the overall time complexity. The overall functionality of LSTD mainly consists of three phases. In the first phase, it calculates the rank of the tasks for deciding the scheduling order. The next step is responsible for duplicating the entry task on the processor only if it increases the overall efficiency and avoids processor overloading. Finally, in the last step, the processor is assigned to the tasks based on the popular insertion‐based policy that attempts to insert the task among two earlier assigned tasks on a given processor in earliest idle time. In order to verify the usefulness of the proposed algorithm, several existing well‐known algorithms, such as Heterogeneous Earliest Finish Time (HEFT), Critical Path on a Processor (CPOP), and Predict Earliest Finish Time (PEFT) are considered for comparison. A non‐duplication version of LSTD named List Scheduling without Task Duplication algorithm is also considered for performance evaluation. The experimental analysis based on scientific Big Data workflows (CyberShake, Montage, and LIGO) proves that LSTD significantly surpasses all considered scheduling heuristics concerning schedule length ratio, the percentage of best results, and average running time metrics.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here

Accelerating Research