z-logo
Premium
Approximation algorithms and heuristics for task scheduling in data‐intensive distributed systems
Author(s) -
Póvoa Marcelo G.,
Xavier Eduardo C.
Publication year - 2018
Publication title -
international transactions in operational research
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.032
H-Index - 52
eISSN - 1475-3995
pISSN - 0969-6016
DOI - 10.1111/itor.12527
Subject(s) - computer science , heuristics , scheduling (production processes) , mathematical optimization , heuristic , distributed computing , task (project management) , algorithm , mathematics , artificial intelligence , management , economics , operating system
In this work, we are interested in the problem of task scheduling on large‐scale data‐intensive computing systems. In order to achieve good performance, one must construct not only good task schedules but also good data allocation across nodes on the system, since before a task can be executed, it must have access to data distributed on the system. In this article, we present a general formulation of a static problem that combines both scheduling and replication problems in data‐intensive distributed systems. We show that this problem does not admit an approximation algorithm. However, considering a restricted version of the problem that considers some practical constraints, an approximation algorithm can be designed. From a practical perspective, we introduce a novel heuristic for the problem that is based on nodes clustering. We compare the heuristic with two adapted approaches from other works in the literature by computational simulations using an extensive set of instances based on real computer grids. We show that our heuristic often obtains the best solutions and also runs faster than other approaches.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here