z-logo
Premium
Optimizing data partition for scaling out NoSQL cluster
Author(s) -
Huang Xiangdong,
Wang Jianmin,
Zhong Yu,
Song Shaoxu,
Yu Philip S.
Publication year - 2015
Publication title -
concurrency and computation: practice and experience
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.309
H-Index - 67
eISSN - 1532-0634
pISSN - 1532-0626
DOI - 10.1002/cpe.3643
Subject(s) - nosql , computer science , partition (number theory) , benchmark (surveying) , consistent hashing , hash function , cloud computing , shuffling , parallel computing , hash table , data mining , big data , mathematics , geodesy , computer security , combinatorics , double hashing , geography , operating system , programming language
Summary Data partition impacts the performance of Not Only SQL (NoSQL) systems significantly. Nowadays, many of the peer‐to‐peer NoSQL systems use consistent hashing to partition data automatically. These systems use virtual nodes and random data placement methods to divide the consistent hashing ring, which may lead to imbalanced data partition and degrade the overall system performance. The problem is prominent especially for scaling out heterogeneous clusters. Considering the capacity of each node, an imbalance coefficient of data distribution for a cluster is proposed firstly in this paper. Based on the imbalance coefficient, we propose a dynamic programming algorithm to calculate the position of the new coming node in the consistent hashing ring, which expands the consistent hashing ring more evenly without re‐shuffling the entire datasets. Simulations and experiments on Cassandra with Yahoo! Cloud Serving Benchmark (YCSB) benchmark show our algorithm is better than the state‐of‐the‐art work. Copyright © 2015 John Wiley & Sons, Ltd.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here