Content-aware data distribution over cluster nodes | Zendy

Adam Krechowicz | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Content-aware data distribution over cluster nodes

Author(s) -

Adam Krechowicz

Publication year - 2021

Publication title -

intelligent data analysis

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.231

H-Index - 47

eISSN - 1571-4128

pISSN - 1088-467X

DOI - 10.3233/ida-205360

Subject(s) - computer science , scalability , cluster analysis , big data , data mining , set (abstract data type) , node (physics) , data set , distributed computing , distributed database , database , artificial intelligence , engineering , structural engineering , programming language

Proper data items distribution may seriously improve the performance of data processing in distributed environment. However, typical datastorage systems as well as distributed computational frameworks do not pay special attention to that aspect. In this paper author introduces two custom data items addressing methods for distributed datastorage on the example of Scalable Distributed Two-Layer Datastore. The basic idea of those methods is to preserve that data items stored on the same cluster node are similar to each other following concepts of data clustering. Still, most of the data clustering mechanisms have serious problem with data scalability which is a severe limitation in Big Data applications. The proposed methods allow to efficiently distribute data set over a set of buckets. As it was shown by the experimental results, all proposed methods generate good results efficiently in comparison to traditional clustering techniques like k-means, agglomerative and birch clustering. Distributed environment experiments shown that proper data distribution can seriously improve the effectiveness of Big Data processing.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research