Optimizing data partition for scaling out NoSQL cluster | Zendy

Huang Xiangdong | Zendy; Wang Jianmin | Zendy; Zhong Yu | Zendy; Song Shaoxu | Zendy; Yu Philip S. | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Premium

Optimizing data partition for scaling out NoSQL cluster

Author(s) -

Huang Xiangdong,

Wang Jianmin,

Zhong Yu,

Song Shaoxu,

Yu Philip S.

Publication year - 2015

Publication title -

concurrency and computation: practice and experience

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.309

H-Index - 67

eISSN - 1532-0634

pISSN - 1532-0626

DOI - 10.1002/cpe.3643

Subject(s) - nosql , computer science , partition (number theory) , benchmark (surveying) , consistent hashing , hash function , cloud computing , shuffling , parallel computing , hash table , data mining , big data , mathematics , geodesy , computer security , combinatorics , double hashing , geography , operating system , programming language

Summary Data partition impacts the performance of Not Only SQL (NoSQL) systems significantly. Nowadays, many of the peer‐to‐peer NoSQL systems use consistent hashing to partition data automatically. These systems use virtual nodes and random data placement methods to divide the consistent hashing ring, which may lead to imbalanced data partition and degrade the overall system performance. The problem is prominent especially for scaling out heterogeneous clusters. Considering the capacity of each node, an imbalance coefficient of data distribution for a cluster is proposed firstly in this paper. Based on the imbalance coefficient, we propose a dynamic programming algorithm to calculate the position of the new coming node in the consistent hashing ring, which expands the consistent hashing ring more evenly without re‐shuffling the entire datasets. Simulations and experiments on Cassandra with Yahoo! Cloud Serving Benchmark (YCSB) benchmark show our algorithm is better than the state‐of‐the‐art work. Copyright © 2015 John Wiley & Sons, Ltd.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here

Accelerating Research