z-logo
open-access-imgOpen Access
Understanding the Limits of Passive Realtime Datacenter Fault Detection and Localization
Author(s) -
Arjun Roy,
Rajdeep Das,
Hongyi Zeng,
Jasmeet Bagga,
Alex C. Snoeren
Publication year - 2019
Publication title -
ieee/acm transactions on networking
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.022
H-Index - 174
eISSN - 1558-2566
pISSN - 1063-6692
DOI - 10.1109/tnet.2019.2938228
Subject(s) - computer science , router , network packet , host (biology) , reliability (semiconductor) , real time computing , process (computing) , outlier , fault detection and isolation , fault (geology) , computer network , artificial intelligence , ecology , power (physics) , seismology , actuator , biology , geology , operating system , physics , quantum mechanics
Datacenters are characterized by large scale, stringent reliability requirements, and significant application diversity. However, the realities of employing hardware with non-zero failure rates mean that datacenters are subject to significant numbers of failures that can impact performance. Moreover, failures are not always obvious; network components can fail partially, dropping or delaying only subsets of packets. Thus, traditional fault detection techniques involving end-host or router-based statistics can fall short in their ability to identify these errors. We describe how to expedite the process of detecting and localizing partial datacenter faults using an end-host method generalizable to most datacenter applications. In particular, we correlate end-host transport-layer flow metrics with per-flow network paths and apply statistical analysis techniques to identify outliers and localize faulty links and/or switches. We evaluate our approach in a production Facebook front-end datacenter, focusing on its effectiveness across a range of traffic patterns.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom