z-logo
open-access-imgOpen Access
Rapid development of cloud-native intelligent data pipelines for scientific data streams using the HASTE Toolkit
Author(s) -
Ben Blamey,
Salman Toor,
Martin Dahlö,
Håkan Wieslander,
Philip J. Harrison,
IdaMaria Sintorn,
Alan Sabirsh,
Carolina Wählby,
Ola Spjuth,
Andreas Hellander
Publication year - 2021
Publication title -
gigascience
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 2.947
H-Index - 54
ISSN - 2047-217X
DOI - 10.1093/gigascience/giab018
Subject(s) - cloud computing , computer science , data science , streams , data stream mining , pipeline transport , data mining , engineering , operating system , environmental engineering
Large streamed datasets, characteristic of life science applications, are often resource-intensive to process, transport and store. We propose a pipeline model, a design pattern for scientific pipelines, where an incoming stream of scientific data is organized into a tiered or ordered "data hierarchy". We introduce the HASTE Toolkit, a proof-of-concept cloud-native software toolkit based on this pipeline model, to partition and prioritize data streams to optimize use of limited computing resources.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom