
Learning Towards Failure Prediction of High Performance Computing Clusters by Employing LSTM
Author(s) -
Kamaljit Kaur,
Kuljit Kaur
Publication year - 2019
Publication title -
international journal of engineering and advanced technology
Language(s) - English
Resource type - Journals
ISSN - 2249-8958
DOI - 10.35940/ijeat.f7885.088619
Subject(s) - computer science , long short term memory , reinforcement learning , artificial intelligence , machine learning , artificial neural network , recurrent neural network
This Failure prediction of high-performance computing clusters (HPCC) is a crucial issue and a hot problem for many years. Previous works have failed to provide a robust method for real-time failure prediction of HPCC. The available techniques are old, unrealistic and provide low accuracy. This paper presents an efficient technique which provides robust failure prediction with good accuracy and state of the art models. We have employed the concept of long short-term memory (LSTM) with reinforcement learning to correct the prediction accuracy in real-time and provide a solution to the industry with reliable results.