A Reliability Model for Dependent and Distributed MDS Disk Array Units
Author(s) -
Şuayb Ş. Arslan
Publication year - 2018
Publication title -
ieee transactions on reliability
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.032
H-Index - 102
eISSN - 1558-1721
pISSN - 0018-9529
DOI - 10.1109/tr.2018.2878503
Subject(s) - backup , computer science , distributed data store , erasure code , distributed computing , disk array , reliability engineering , terabyte , raid , reliability (semiconductor) , bandwidth (computing) , mean time between failures , cloud storage , failure rate , cloud computing , computer network , database , algorithm , computer hardware , decoding methods , operating system , engineering , power (physics) , physics , quantum mechanics
Archiving and systematic backup of large digital data generates a quick demand for multi-petabyte scale storage systems. As drive capacities continue to grow beyond the few terabytes range to address the demands of today's cloud, the likelihood of having multiple/simultaneous disk failures became a reality. Among the main factors causing catastrophic system failures, correlated disk failures and the network bandwidth are reported to be the two common source of performance degradation. The emerging trend is to use efficient/sophisticated erasure codes (EC) equipped with multiple parities and efficient repairs in order to meet the reliability/bandwidth requirements. It is known that mean time to failure and repair rates reported by the disk manufacturers cannot capture life-cycle patterns of distributed storage systems. In this study, we develop failure models based on generalized Markov chains that can accurately capture correlated performance degradations with multiparity protection schemes based on modern maximum distance separable EC. Furthermore, we use the proposed model in a distributed storage scenario to quantify two example use cases: Primarily, the common sense that adding more parity disks are only meaningful if we have a decent decorrelation between the failure domains of storage systems and the reliability of generic multiple single-dimensional EC protected storage systems.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom