z-logo
open-access-imgOpen Access
FAULT TOLERATING MECHANISM IN DISTRIBUTED COMPUTING ENVIRONMENT
Author(s) -
Lokendra Gour,
Akhilesh A. Waoo
Publication year - 2020
Publication title -
international journal of engineering applied science and technology
Language(s) - English
Resource type - Journals
ISSN - 2455-2143
DOI - 10.33564/ijeast.2020.v05i04.096
Subject(s) - mechanism (biology) , distributed computing , computer science , distributed computing environment , physics , quantum mechanics
Large scale distributed systems encompass heterogeneous computational machines, workloads and sub-systems dispersed diversely across the cloud environment. These sub-systems frequently encounter faults and failures due to different data structures, hardware/software malfunction, and communication delay. To speed up computation in such a situation a fault tolerating infrastructure is implemented by adopting a machine learning approach. Under machine learning, an artificial neural network (ANN) captures, manipulates, and updates the states and behaviors of the sub-systems in the servers and worker's machines. Multiple layers of neurons (i. e., deep learning) can handle large scale distributed systems with large datasets. Adopting the variants of a stochastic gradient descend algorithm on subsystems (also known as computational nodes) the efficiency, and reliability of a distributed system are enhanced significantly. In high-performance computing (HPC) applications fault tolerance mechanisms must be embedded to recover from system failures. Keywords— Distributed System, Cloud Environment, Fault Tolerance, Machine Learning, Artificial Neural Network

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here