Premium
Generate‐map‐reduce: An extension to map‐reduce to support shared data and recursive computations
Author(s) -
Dharanipragada Janakiram,
Iyer Geeta,
Kailasam Sriram
Publication year - 2014
Publication title -
concurrency and computation: practice and experience
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.309
H-Index - 67
eISSN - 1532-0634
pISSN - 1532-0626
DOI - 10.1002/cpe.3018
Subject(s) - computer science , speedup , parallel computing , computation , cluster analysis , distributed memory , data parallelism , theoretical computer science , shared memory , distributed computing , parallelism (grammar) , algorithm , machine learning
SUMMARY It is difficult to express the parallelism present in complex computations by using existing higher level abstractions such as MapReduce and Dryad. These computations include applications from wide variety of domains, like Artificial Intelligence, Decision Tree Algorithms, Association Rule Mining, Recommender Systems, Graph Algorithms, Clustering Algorithms, Compute Intensive Scientific Workflows, Optimization Algorithms, and so forth. Their execution graphs introduce new challenges in terms of programmer expressibility and runtime performance such as iterative and recursive computations, shared communication model, and so forth. We propose an extension to MapReduce, called Generate‐Map‐Reduce (GMR), targeted towards modeling these applications. GMR introduces a new Generate abstraction into the MapReduce framework that captures recursive computations. The runtime also supports iterative jobs and a distributed communication model by using shared data structures. We illustrate recursive computations with GMR by modeling complex applications such as simulated annealing, A* search, and adaptive quadrature computation that require recursive spawning of new tasks to handle variable degree of parallelism. GMR runtime supports caching of common data across iterations in memory and local disks. We illustrate how this caching helps in achieving significant speedup for iterative computations by modeling k ‐means clustering. Copyright © 2013 John Wiley & Sons, Ltd.