z-logo
open-access-imgOpen Access
Automated performance tuning of distributed storage system based on deep reinforcement learning
Author(s) -
Lu Wang,
Wentao Zhang,
Yaodong Cheng
Publication year - 2020
Publication title -
journal of physics. conference series
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.21
H-Index - 85
eISSN - 1742-6596
pISSN - 1742-6588
DOI - 10.1088/1742-6596/1525/1/012090
Subject(s) - lustre (file system) , reinforcement learning , computer science , workload , artificial intelligence , deep learning , machine learning , learning classifier system , reinforcement , file system , task (project management) , operating system , engineering , structural engineering , systems engineering
Automated performance tuning is a tricky task for a large scale storage system. Traditional methods highly reply on experience of system administrators and cannot adapt to changes of working load and system configurations. Reinforcement learning is a promising machine learning paradigm which learns an optimized strategy from the trials and errors between agents and environments. Combining with the strong feature learning capability of deep learning, deep reinforcement learning has showed its success in many fields. We implemented a performance parameter tuning engine based on deep reinforcement learning for Lustre file system, a distributed file system widely used in HEP data centres. Three reinforcement learning algorithms: Deep Q-learning, A2C, and PPO are enabled in the tuning engine. Experiments show that, in a small test bed, with IOzone workload, this method can increase the random read throughput by about 30% compared to default settings of Lustre. In the future, it is possible to apply this method to other parameter tuning use cases of data centre operations.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here