Premium
MEMoMR: Accelerate MapReduce via reuse of intermediate results
Author(s) -
Yao Hong,
Xu Jinlai,
Luo Zhongwen,
Zeng Deze
Publication year - 2015
Publication title -
concurrency and computation: practice and experience
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.309
H-Index - 67
eISSN - 1532-0634
pISSN - 1532-0626
DOI - 10.1002/cpe.3702
Subject(s) - computer science , reuse , scalability , cloud computing , overhead (engineering) , big data , metadata , distributed computing , programming paradigm , mechanism (biology) , database , operating system , programming language , ecology , biology , philosophy , epistemology
Summary MapReduce has been widely regarded as a flexible, scalable, and easy‐to‐use distributed programming paradigm for big data processing such as social network data analysis on cloud computing platforms. To embrace the upcoming of big data era, many efforts have been devoted to accelerating the MapReduce performance from different aspects, especially intermediate result reusing like Dache. In this paper, we observe that existing intermediate result reusing mechanism is not efficient enough as many I/O operations are wasted. Efficient reusing of the intermediate results could potentially improve the MapReduce performance. Inspired by such fact, we propose a framework named MEMoMR (more efficient intermediate result reusing for MapReduce) by introducing a novel reusing mechanism that can substantially reduce the I/O overhead. To this end, we invent a new metadata description method and apply it in the reusing phase. We practically realize MEMoMR and evaluate its performance by implementing it in a real cluster. The experiment results show that MEMoMR can improve the system performance as high as 23.4%, comparing against Dache. Copyright © 2015 John Wiley & Sons, Ltd.