z-logo
open-access-imgOpen Access
Performance Health Monitoring of Large-Scale Systems
Author(s) -
Ram Rajamony
Publication year - 2014
Language(s) - English
Resource type - Reports
DOI - 10.2172/1164888
Subject(s) - isolation (microbiology) , scale (ratio) , reliability engineering , computer science , fault detection and isolation , systems engineering , software , remedial action , fault (geology) , distributed computing , engineering , operating system , ecology , physics , quantum mechanics , contamination , artificial intelligence , seismology , geology , environmental remediation , microbiology and biotechnology , actuator , biology
This report details the progress made on the ASCR funded project Performance Health Monitoring for Large Scale Systems. A large-‐scale application may not achieve its full performance potential due to degraded performance of even a single subsystem. Detecting performance faults, isolating them, and taking remedial action is critical for the scale of systems on the horizon. PHM aims to develop techniques and tools that can be used to identify and mitigate such performance problems. We accomplish this through two main aspects. The PHM framework encompasses diagnostics, system monitoring, fault isolation, and performance evaluation capabilities that indicates when a performance fault has been detected, either due to an anomaly present in the system itself or due to contention for shared resources between concurrently executing jobs. Software components called the PHM Control system then build upon the capabilities provided by the PHM framework to mitigate degradation caused by performance problems.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom