HaDaap: A hotness‐aware data placement strategy for improving storage efficiency in heterogeneous Hadoop clusters | Zendy

Xiong Runqun | Zendy; Du Yao | Zendy; Jin Jiahui | Zendy; Luo Junzhou | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Premium

HaDaap: A hotness‐aware data placement strategy for improving storage efficiency in heterogeneous Hadoop clusters

Author(s) -

Xiong Runqun,

Du Yao,

Jin Jiahui,

Luo Junzhou

Publication year - 2018

Publication title -

concurrency and computation: practice and experience

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.309

H-Index - 67

eISSN - 1532-0634

pISSN - 1532-0626

DOI - 10.1002/cpe.4830

Subject(s) - computer science , erasure code , replication (statistics) , data redundancy , big data , data center , distributed data store , sort , redundancy (engineering) , computer data storage , distributed computing , database , data mining , operating system , algorithm , statistics , decoding methods , mathematics

Summary Enterprises increasingly use the Hadoop Distributed File System (HDFS) to manage and store big data for many applications. However, HDFS uses triple replication, leading to staggering data center storage costs. As big data increases in volume and its heat levels becomes more sensitive, there comes a point where storing so much cold data actually makes it less accessible and more expensive. Meanwhile, as data centers expand, the heterogeneity of nodes also becomes an issue. Rack‐aware data placement adopted by HDFS results in an unbalanced load and uneven resource allocation because it ignores the data nodes' heterogeneity. Here, we attempt to resolve these problems by proposing a hotness‐aware data placement strategy (named HaDaap). In HaDaap, the first step is to use a hotness‐aware data clustering algorithm to set the data's degree of heat. Then, cold data (with a redundancy of erasure code) are placed through a Double Sort Exchange algorithm to reduce storage costs and increase data availability. Finally, hot data are placed via a dynamic replication placement mechanism that comprehensively factors availability, load, and storage costs. Experimental results show that with these enhancements, HaDaap uses resources rationally and substantially reduces storage costs by considering the difference of data hotness in heterogeneous Hadoop clusters.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here

Accelerating Research