z-logo
open-access-imgOpen Access
Dataset Reuse: Toward Translating Principles to Practice
Author(s) -
Laura Koesten,
Pavlos Vougiouklis,
Elena Simperl,
Paul Groth
Publication year - 2020
Publication title -
patterns
Language(s) - English
Resource type - Journals
ISSN - 2666-3899
DOI - 10.1016/j.patter.2020.100136
Subject(s) - reuse , computer science , reusability , context (archaeology) , data science , code (set theory) , code reuse , software engineering , world wide web , data mining , software , programming language , engineering , paleontology , set (abstract data type) , biology , waste management
Summary The web provides access to millions of datasets that can have additional impact when used beyond their original context. We have little empirical insight into what makes a dataset more reusable than others and which of the existing guidelines and frameworks, if any, make a difference. In this paper, we explore potential reuse features through a literature review and present a case study on datasets on GitHub, a popular open platform for sharing code and data. We describe a corpus of more than 1.4 million data files, from over 65,000 repositories. Using GitHub's engagement metrics as proxies for dataset reuse, we relate them to reuse features from the literature and devise an initial model, using deep neural networks, to predict a dataset's reusability. This demonstrates the practical gap between principles and actionable insights that allow data publishers and tools designers to implement functionalities that provably facilitate reuse.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom