z-logo
open-access-imgOpen Access
Datastorr: a workflow and package for delivering successive versions of 'evolving data' directly into R
Author(s) -
Daniel S. Falster,
Richard G. FitzJohn,
Matthew W. Pennell,
William K. Cornwell
Publication year - 2019
Publication title -
gigascience
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 2.947
H-Index - 54
ISSN - 2047-217X
DOI - 10.1093/gigascience/giz035
Subject(s) - workflow , computer science , software versioning , data sharing , process (computing) , software , software engineering , cornerstone , data science , data mining , information retrieval , database , programming language , medicine , art , alternative medicine , pathology , visual arts
The sharing and re-use of data has become a cornerstone of modern science. Multiple platforms now allow easy publication of datasets. So far, however, platforms for data sharing offer limited functions for distributing and interacting with evolving datasets- those that continue to grow with time as more records are added, errors fixed, and new data structures are created. In this article, we describe a workflow for maintaining and distributing successive versions of an evolving dataset, allowing users to retrieve and load different versions directly into the R platform. Our workflow utilizes tools and platforms used for development and distribution of successive versions of an open source software program, including version control, GitHub, and semantic versioning, and applies these to the analogous process of developing successive versions of an open source dataset. Moreover, we argue that this model allows for individual research groups to achieve a dynamic and versioned model of data delivery at no cost.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom