Premium
Scripting distributed scientific workflows using Weaver
Author(s) -
Bui Peter,
Yu Li,
Thrasher Andrew,
Carmichael Rory,
Lanc Irena,
Donnelly Patrick,
Thain Douglas
Publication year - 2012
Publication title -
concurrency and computation: practice and experience
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.309
H-Index - 67
eISSN - 1532-0634
pISSN - 1532-0626
DOI - 10.1002/cpe.1871
Subject(s) - workflow , computer science , scalability , scripting language , python (programming language) , software engineering , software , workflow technology , workflow management system , programming language , domain (mathematical analysis) , distributed computing , database , mathematical analysis , mathematics
SUMMARY Weaver is a high‐level distributed computing framework that enables researchers to construct scalable scientific data‐processing workflows. Instead of developing a new workflow language, we introduce a domain‐specific language built on top of Python called Weaver, which takes advantage of users' familiarity with the programming language, minimizes barriers to adoption, and allows for integration with a rich ecosystem of existing software. In this paper, we provide an overview of Weaver's programming model, which allows users to organize and specify scientific workflows by using a collection of datasets, functions, and abstractions. We also explain how these workflow specifications are compiled into a directed acyclic graph that is used by the Makeflow workflow manager to dispatch work to a variety of distributed execution platforms. To demonstrate the power and benefits of using the framework in constructing scientific research applications, the paper examines four distinct real‐world applications scripted using Weaver and analyzes the performance, scalability, and impact of the distributed generated scientific workflows. Copyright © 2011 John Wiley & Sons, Ltd.