Premium
Nonintrusive collection and management of data provenance in scientific workflows
Author(s) -
Tylissanakis Giorgos,
Cotronis Yiannis
Publication year - 2012
Publication title -
concurrency and computation: practice and experience
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.309
H-Index - 67
eISSN - 1532-0634
pISSN - 1532-0626
DOI - 10.1002/cpe.2809
Subject(s) - workflow , computer science , workflow engine , database , workflow technology , xpdl , dependency graph , graph database , graph , graph traversal , information retrieval , tree traversal , workflow management system , windows workflow foundation , data mining , data science , world wide web , theoretical computer science , programming language
SUMMARY In this paper, we introduce an efficient mechanism to collect, store, and retrieve data provenance information in workflows of multiphysics simulations. Using notifications, we enable the nonintrusive collection of information about workflow events during workflow execution. Combining these events with workflow structure information, constant for every execution of a workflow, we obtain the data provenance information for the specific run of the workflow. Data provenance information is structured into a graph that represents workflow events on the basis of their causal dependency. We use a graph database to store this graph and utilize the traversal framework provided, to efficiently retrieve data provenance information from the graph by traversing backwards from a data object to every workflow event that is part of its provenance. Finally, we integrate data provenance information with semantics of workflow services to provide complete and meaningful data provenance information. Copyright © 2012 John Wiley & Sons, Ltd.