z-logo
open-access-imgOpen Access
Realising Data-Centric Scientific Workflows with Provenance-Capturing on Data Lakes
Author(s) -
Hendrik Nolte,
Philipp Wieder
Publication year - 2022
Publication title -
data intelligence
Language(s) - English
Resource type - Journals
eISSN - 2096-7004
pISSN - 2641-435X
DOI - 10.1162/dint_a_00141
Subject(s) - metadata , computer science , workflow , reusability , data science , schema (genetic algorithms) , data mapping , usability , architecture , data type , software engineering , database , world wide web , information retrieval , software , human–computer interaction , programming language , art , visual arts
Since their introduction by James Dixon in 2010, data lakes get more and more attention, driven by the promise of high reusability of the stored data due to the schema-on-read semantics. Building on this idea, several additional requirements were discussed in literature to improve the general usability of the concept, like a central metadata catalog including all provenance information, an overarching data governance, or the integration with (high-performance) processing capabilities. Although the necessity for a logical and a physical organisation of data lakes in order to meet those requirements is widely recognized, no concrete guidelines are yet provided. The most common architecture implementing this conceptual organisation is the zone architecture, where data is assigned to a certain zone depending on the degree of processing. This paper discusses how FAIR Digital Objects can be used in a novel approach to organize a data lake based on data types instead of zones, how they can be used to abstract the physical implementation, and how they empower generic and portable processing capabilities based on a provenance-based approach.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom