Premium
A scalable framework for continuous query evaluations over multidimensional, scientific datasets
Author(s) -
Tolooee Cameron,
Malensek Matthew,
Pallickara Sangmi Lee
Publication year - 2015
Publication title -
concurrency and computation: practice and experience
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.309
H-Index - 67
eISSN - 1532-0634
pISSN - 1532-0626
DOI - 10.1002/cpe.3651
Subject(s) - computer science , cache , scalability , distributed computing , preprocessor , database , data mining , parallel computing , artificial intelligence
Summary Efficient access to voluminous multidimensional datasets is essential for scientific applications. Fast evolving datasets present unique challenges during retrievals. Keeping data up‐to‐date can be expensive and may involve the following: repeated data queries, excessive data movements, and redundant data preprocessing. This paper focuses on the issue of efficient manipulation of query results in cases where the dataset is continuously evolving. Our approach provides an automated and scalable tracking and caching mechanism to evaluate continuous queries over data stored in a distributed storage system. We have designed and developed a distributed updatable cache that ensures the query output to contain the most recent data arrivals. We have developed a dormant cache framework to address strains on caching capacity due to intensive memory requirements. The data to be stored in the dormant cache are selected using the cached continuous query scheduling algorithm that we have designed and developed. This approach is evaluated in the context of Galileo, our distributed data storage framework. This paper includes an empirical evaluation performed on Amazon Web Services' cluster and a private cluster. Our performance benchmarks demonstrate the efficacy of our approach. Copyright © 2015 John Wiley & Sons, Ltd.