Rapid development of cloud-native intelligent data pipelines for scientific data streams using the HASTE Toolkit
Author(s) -
Ben Blamey,
Salman Toor,
Martin Dahlö,
Håkan Wieslander,
Philip J. Harrison,
IdaMaria Sintorn,
Alan Sabirsh,
Carolina Wählby,
Ola Spjuth,
Andreas Hellander
Publication year - 2021
Publication title -
gigascience
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 2.947
H-Index - 54
ISSN - 2047-217X
DOI - 10.1093/gigascience/giab018
Subject(s) - cloud computing , computer science , data science , streams , data stream mining , pipeline transport , data mining , engineering , operating system , environmental engineering
Large streamed datasets, characteristic of life science applications, are often resource-intensive to process, transport and store. We propose a pipeline model, a design pattern for scientific pipelines, where an incoming stream of scientific data is organized into a tiered or ordered "data hierarchy". We introduce the HASTE Toolkit, a proof-of-concept cloud-native software toolkit based on this pipeline model, to partition and prioritize data streams to optimize use of limited computing resources.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom