z-logo
Premium
Adaptive cache policy scheduling for big data applications on distributed tiered storage system
Author(s) -
Gu Rong,
Li Chongjie,
Shu Peng,
Yuan Chunfeng,
Huang Yihua
Publication year - 2019
Publication title -
concurrency and computation: practice and experience
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.309
H-Index - 67
eISSN - 1532-0634
pISSN - 1532-0626
DOI - 10.1002/cpe.5138
Subject(s) - computer science , cache , cache algorithms , cache pollution , cache coloring , distributed computing , cache invalidation , page cache , smart cache , operating system , computer network , cpu cache
Summary Multitiered storage systems, which are made up of heterogeneous devices, are widely used in distributed environments to accelerate the I/O performance of upper big data applications. It raises new challenges in efficient data migration through smart caching mechanisms among heterogeneous storage levels, such as MEM‐SSD‐HDD. To optimize the cache policy scheduling mechanism on the distributed tiered storage architecture, we proposed a general framework with five layers, including a tiered storage system layer, a cache migration policy layer, a cache policy adaptive scheduling layer, a data access pattern layer, and a big data application layer. The framework prototype has been designed and implemented on the widely used distributed hybrid storage system named Alluxio. To meet the demands of the big data application layer, on the one hand, we designed a couple of cache eviction policies and promotion policies covering various access patterns on the cache migration policy layer (several proposed eviction policies have been adopted by the Alluxio open‐source community). On the other hand, two adaptive cache policy scheduling algorithms for selecting appropriate policies in various scenarios are designed and implemented on the cache policy adaptive scheduling layer. The scheduling algorithms are designed based on the hit ratio statistics and data access pattern model prediction, respectively. Experimental results show that the proposed cache policies are very effective for various big data applications, such as Spark SQL. The proposed cache policy scheduling algorithms with various eviction policies can improve around 20% hit ratio than that with a single eviction policy.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here