An efficient iterative graph data processing framework based on bulk synchronous parallel model | Zendy

Liu Chao | Zendy; Zeng Deze | Zendy; Yao Hong | Zendy; Yan Xuesong | Zendy; Yu Linchen | Zendy; Fu Zhangjie | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Premium

An efficient iterative graph data processing framework based on bulk synchronous parallel model

Author(s) -

Liu Chao,

Zeng Deze,

Yao Hong,

Yan Xuesong,

Yu Linchen,

Fu Zhangjie

Publication year - 2018

Publication title -

concurrency and computation: practice and experience

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.309

H-Index - 67

eISSN - 1532-0634

pISSN - 1532-0626

DOI - 10.1002/cpe.4432

Subject(s) - computer science , graph , big data , graph database , theoretical computer science , hash function , pagerank , distributed computing , load balancing (electrical power) , parallel computing , graph partition , power graph analysis , synchronization (alternating current) , data mining , computer network , channel (broadcasting) , geometry , computer security , mathematics , grid

Summary Graph data processing has been widely applied in a variety of domains such as industry, science, social network, and so on. It therefore has stimulated many efforts devoted to this area. To embrace the fast development trend of big graph data, graph data processing based on Pregel‐like systems has been regarded as one of the most promising ways and has widely attracted the attention of researchers. However, it still remains in its early stage and there still exist many challenges. In Pregel, the superstep synchronization is time consuming as the graph data iteration operation requires multiple synchronizations. Furthermore, the graph data partition strategy adopted by Pregel fails to support load balancing, therefore causing the increase of network I/O overhead as the scale of graph data grows. To address these issues, this paper presents an efficient computational framework for graph data processing based on the bulk synchronous parallel model. The global synchronization control mechanism is improved by determining the start time of the next round of superstep through counting the number of global message files. Furthermore, an improved graph data partition mechanism based on a balanced hash method is proposed to reduce the communication overhead between different partitions of sub‐graph computational tasks. We also re‐design the PageRank algorithm to verify the effectiveness of the proposed framework. Experimental results on different real‐world datasets verify the efficiency of our proposed framework as it outperforms Giraph (an open source Pregel‐like system) by 58 % −69 % , and achieves 10×−17× performance improvement over Hadoop.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here

Accelerating Research