z-logo
open-access-imgOpen Access
Reliability Analysis of Storage Systems With Partially Repairable Devices
Author(s) -
Serkay Olmez
Publication year - 2021
Publication title -
ieee transactions on device and materials reliability
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.384
H-Index - 70
eISSN - 1558-2574
pISSN - 1530-4388
DOI - 10.1109/tdmr.2021.3077848
Subject(s) - engineered materials, dielectrics and plasmas , components, circuits, devices and systems , power, energy and industry applications
Modern storage devices such as hard disk drives (HDDs) and solid state drives (SSDs) have reached capacities beyond 18TB. Failure of such devices requires data recovery from parities. Given the large capacities, the recovery process may take up to a few days depending on the bandwidth and the erasure coding scheme implemented. During the recovery, the system is vulnerable to data loss if additional device failures are encountered. Therefore, it is important to complete the recovery as quickly as possible. The recovery can be accelerated if the data on the failed device is only partially corrupted, and the remaining portion is still accessible. This is indeed the case for storage devices that consist of multiple physical units of recording subsystems. For example, modern HDDs have up to 18 heads, and SSDs have multiple flash chips. These subsystems may fail independently without affecting the rest of the components in the device. In this work, we study the durability of data when the device is allowed to stay online even when a number of subcomponents fail. In addition to extending the lifetime of the devices, this also allows for faster recovery of the critical data stored on the failed subsystem, which results in significant gains in the overall data durability for the storage system.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here