z-logo
Premium
A two steps method of resources utilization predication for large Hadoop data center
Author(s) -
Yu Lei,
Teng Fei,
Ning Shangming,
Li Yunshu,
Cui Zhe,
Du Shengdong
Publication year - 2020
Publication title -
concurrency and computation: practice and experience
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.309
H-Index - 67
eISSN - 1532-0634
pISSN - 1532-0626
DOI - 10.1002/cpe.5634
Subject(s) - bottleneck , benchmark (surveying) , computer science , task (project management) , data center , big data , turnaround time , resource (disambiguation) , data mining , database , distributed computing , operating system , embedded system , engineering , computer network , geodesy , systems engineering , geography
Summary With the increase of data processing and Hadoop data center construction requirements, the performance of Hadoop data center is limited by inappropriate resources utilization. This paper introduces a new method to predict utilization for large‐scale Hadoop clusters. The new method adopts a two steps model, which includes Hadoop applications' performance simulation and resources utilization prediction. For performance simulation, a new simulator, which integrates baseline test and multilayered network model, is introduced and implemented. A resources utilization predictor is proposed in the second step. By analyzing the pattern of resources utilization, a single task model is proposed. A parallel‐batch‐task‐based (PBT) model, which represents the behavior of real Hadoop applications by integrating the single task model, is introduced. Two test scenarios are configured to verify the performance of our method. For the data center scenario, Terasort, Wordcount, and Hive are selected as benchmarks. In the virtual machines scenario, Terasort is used as benchmark. The experiments show that the error comparing between the simulator results and experimental environment results in most cases is less than 10%. The results confirm that we can locate the resource bottleneck for Hadoop clusters, meanwhile we can agilely configure clusters for applications with massive data.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here