Adaptive cache policy scheduling for big data applications on distributed tiered storage system | Zendy

Gu Rong | Zendy; Li Chongjie | Zendy; Shu Peng | Zendy; Yuan Chunfeng | Zendy; Huang Yihua | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Premium

Adaptive cache policy scheduling for big data applications on distributed tiered storage system

Author(s) -

Gu Rong,

Li Chongjie,

Shu Peng,

Yuan Chunfeng,

Huang Yihua

Publication year - 2019

Publication title -

concurrency and computation: practice and experience

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.309

H-Index - 67

eISSN - 1532-0634

pISSN - 1532-0626

DOI - 10.1002/cpe.5138

Subject(s) - computer science , cache , cache algorithms , cache pollution , cache coloring , distributed computing , cache invalidation , page cache , smart cache , operating system , computer network , cpu cache

Summary Multitiered storage systems, which are made up of heterogeneous devices, are widely used in distributed environments to accelerate the I/O performance of upper big data applications. It raises new challenges in efficient data migration through smart caching mechanisms among heterogeneous storage levels, such as MEM‐SSD‐HDD. To optimize the cache policy scheduling mechanism on the distributed tiered storage architecture, we proposed a general framework with five layers, including a tiered storage system layer, a cache migration policy layer, a cache policy adaptive scheduling layer, a data access pattern layer, and a big data application layer. The framework prototype has been designed and implemented on the widely used distributed hybrid storage system named Alluxio. To meet the demands of the big data application layer, on the one hand, we designed a couple of cache eviction policies and promotion policies covering various access patterns on the cache migration policy layer (several proposed eviction policies have been adopted by the Alluxio open‐source community). On the other hand, two adaptive cache policy scheduling algorithms for selecting appropriate policies in various scenarios are designed and implemented on the cache policy adaptive scheduling layer. The scheduling algorithms are designed based on the hit ratio statistics and data access pattern model prediction, respectively. Experimental results show that the proposed cache policies are very effective for various big data applications, such as Spark SQL. The proposed cache policy scheduling algorithms with various eviction policies can improve around 20% hit ratio than that with a single eviction policy.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here

Accelerating Research