Novel data‐placement scheme for improving the data locality of Hadoop in heterogeneous environments | Zendy

Bae Minho | Zendy; Yeo Sangho | Zendy; Park Gyudong | Zendy; Oh Sangyoon | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Premium

Novel data‐placement scheme for improving the data locality of Hadoop in heterogeneous environments

Author(s) -

Bae Minho,

Yeo Sangho,

Park Gyudong,

Oh Sangyoon

Publication year - 2020

Publication title -

concurrency and computation: practice and experience

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.309

H-Index - 67

eISSN - 1532-0634

pISSN - 1532-0626

DOI - 10.1002/cpe.5752

Subject(s) - locality , computer science , overhead (engineering) , scheme (mathematics) , big data , replication (statistics) , parallel computing , distributed computing , distributed database , distributed file system , database , data mining , operating system , mathematics , mathematical analysis , philosophy , linguistics , statistics

Summary To address the challenging needs of high‐performance big data processing, parallel‐distributed frameworks such as Hadoop are being utilized extensively. However, in heterogeneous environments, the performance of Hadoop clusters is below par. This is primarily because the blocks of the clusters are allocated equally to all nodes without regard to differences in the capability of individual nodes. This results in reduced data locality. Thus, a new data‐placement scheme that enhances data locality is required for Hadoop in heterogeneous environments. This article proposes a new data placement scheme that preserves the same degree of data locality in heterogeneous environments as that of the standard Hadoop, with only a small amount of replicated data. In the proposed scheme, only those blocks with the highest probability of being accessed remotely are selected and replicated. The results of experiments conducted indicate that the proposed scheme incurs only a 20% disk space overhead and has virtually the same data locality ratio as the standard Hadoop, which has a replication factor of three and 200% disk space overhead.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here

Empowering knowledge with every search

About

About Careers Publisher Partners Contact Us

Learn

FAQs Blog Terms of Use Privacy Policy

About

Learn

Discover

Explore