Premium
Hierarchical MapReduce: towards simplified cross‐domain data processing
Author(s) -
Luo Yuan,
Plale Beth,
Guo Zhenhua,
Li Wilfred W.,
Qiu Judy,
Sun Yiming
Publication year - 2012
Publication title -
concurrency and computation: practice and experience
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.309
H-Index - 67
eISSN - 1532-0634
pISSN - 1532-0626
DOI - 10.1002/cpe.2929
Subject(s) - computer science , scheduling (production processes) , computation , distributed computing , parallel computing , programming paradigm , data intensive computing , computer cluster , domain (mathematical analysis) , big data , cluster (spacecraft) , operating system , algorithm , grid computing , mathematical analysis , mathematics , programming language , operations management , geometry , economics , grid
SUMMARY The MapReduce programming model has proven useful for data‐driven high throughput applications. However, the conventional MapReduce model limits itself to scheduling jobs within a single cluster. As job sizes become larger, single‐cluster solutions grow increasingly inadequate. We present a hierarchical MapReduce framework that utilizes computation resources from multiple clusters simultaneously to run MapReduce job across them. The applications implemented in this framework adopt the Map–Reduce–GlobalReduce model where computations are expressed as three functions: Map, Reduce, and GlobalReduce. Two scheduling algorithms are proposed, one that targets compute‐intensive jobs and another data‐intensive jobs, evaluated using a life science application, AutoDock, and a simple Grep. Data management is explored through analysis of the Gfarm file system.Copyright © 2012 John Wiley & Sons, Ltd.