Storage-Tag-Aware Scheduler for Hadoop Cluster | Zendy

Nawab Muhammad Faseeh Qureshi | Zendy; Dong Ryeol Shin | Zendy; Isma Farah Siddiqui | Zendy; Bhawani Shankar Chowdhry | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Storage-Tag-Aware Scheduler for Hadoop Cluster

Author(s) -

Nawab Muhammad Faseeh Qureshi,

Dong Ryeol Shin,

Isma Farah Siddiqui,

Bhawani Shankar Chowdhry

Publication year - 2017

Publication title -

ieee access

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.587

H-Index - 127

ISSN - 2169-3536

DOI - 10.1109/access.2017.2725318

Subject(s) - aerospace , bioengineering , communication, networking and broadcast technologies , components, circuits, devices and systems , computing and processing , engineered materials, dielectrics and plasmas , engineering profession , fields, waves and electromagnetics , general topics for engineers , geoscience , nuclear engineering , photonics and electrooptics , power, energy and industry applications , robotics and control systems , signal processing and analysis , transportation

Big data analytics has simplified the processing complexity of extremely large data sets through ecosystems, such as Hadoop, MapR, and Cloudera. Apache Hadoop is an open-source ecosystem that manages large data sets in a distributed environment. MapReduce is a programming model that processes massive amount of unstructured data sets over Hadoop cluster. Recently, Hadoop enhances its homogeneous storage function to heterogeneous storage and stores data sets into multiple storage media, i.e., SSD, RAM, and DISK. This development increases the performance of data block placement strategy and allows a client to store large data sets into multiple storage media efficiently than homogeneous storage. However, this evolution increases the consumption of computing capacity and memory usage over MapReduce job scheduling. The scheduler processes MapReduce job into homogeneous container having configuration of CPU, memory, DISK volume, and network I/O, and accesses, processes, and stores data sets over heterogeneous storage media. This produces a processing latency of locating and pairing source data set to MapReduce tasks and results an abnormal high consumption of computing capacity and memory usage in a Datanode. Similarly, when scheduler assigns MapReduce jobs to multiple Datanodes, the same processing latency can severely affect the performance of whole cluster. In this paper, we present Storage-Tag-Aware Scheduler (STAS) that reduces processing latency by scheduling MapReduce jobs into heterogeneous storage containers, i.e., SSD, DISK, and RAM container. STAS endorses job with a tag of storage media, such as JobSSD, JobDISK, and JobRAM and parses them into heterogeneous shared-queues, which assign processing configuration to enlist jobs. STAS manager then schedules shared-queue jobs into heterogeneous MapReduce containers and generates an output into storage media of the cluster. The experimental evaluation shows that STAS optimizes the consumption of computing capacity and memory usage efficiently than available schedulers in a Hadoop cluster.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research