Experience in using a process language to define scientific workflow and generate dataset provenance
Author(s) -
Leon J. Osterweil,
Lori A. Clarke,
Aaron M. Ellison,
Rodion Podorozhny,
Alexander Wise,
Emery R. Boose,
Julian L. Hadley
Publication year - 2008
Publication title -
citeseer x (the pennsylvania state university)
Language(s) - English
Resource type - Conference proceedings
DOI - 10.1145/1453101.1453147
Subject(s) - computer science , workflow , process (computing) , software engineering , documentation , metadata , work in process , data science , information retrieval , programming language , world wide web , database , marketing , business
This paper describes our experiences in exploring the applicability of software engineering approaches to scientific data management problems. Specifically, this paper describes how process definition languages can be used to expedite production of scientific datasets as well as to generate documentation of their provenance. Our approach uses a process definition language that incorporates powerful semantics to encode scientific processes in the form of a Process Definition Graph (PDG). The paper describes how execution of the PDG-defined process can generate Dataset Derivation Graphs (DDGs), metadata that document how the scientific process developed each of its product datasets. The paper uses an example to show that scientific processes may be complex and to illustrate why some of the more powerful semantic features of the process definition language are useful in supporting clarity and conciseness in representing such processes. This work is similar in goals to work generally referred to as Scientific Workflow. The paper demonstrates the contribution that software engineering can make to this domain.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom