z-logo
open-access-imgOpen Access
Towards Hypothetical Reasoning Using Distributed Provenance
Author(s) -
Daniel Deutch,
Yuval Moskovitch,
Itay Polak,
Noam Rinetzky
Publication year - 2018
Language(s) - English
DOI - 10.5441/002/edbt.2018.47
Hypothetical reasoning is the iterative examination of the effect of modifications to the data on the result of some computation or data analysis query. This kind of reasoning is commonly performed by data scientists to gain insights. Previous work has indicated that fine-grained data provenance can be instrumental for the efficient performance of hypothetical reasoning: instead of a costly re-execution of the underlying application, one may assign values to a pre-computed provenance expression. However, current techniques for fine-grained provenance tracking are ill-suited for large-scale data due to the overhead they entail on both execution time and memory consumption. We outline an approach for hypothetical reasoning for largescale data. Our key insights are: (i) tracking only relevant parts of the provenance based on an a priori specification of classes of hypothetical scenarios that are of interest and (ii) the distributed tracking of provenance tailored to fit distributed data processing frameworks such as Apache Spark.We also discuss the challenges in both respects and our initial directions for addressing them.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom