
Performance evaluation and resource optimization of cloud based parallel Hadoop clusters with an intelligent scheduler.
Author(s) -
S Manishankar,
S. Sathayanarayana
Publication year - 2018
Publication title -
international journal of engineering and technology
Language(s) - English
Resource type - Journals
ISSN - 2227-524X
DOI - 10.14419/ijet.v7i4.13372
Subject(s) - computer science , scalability , cloud computing , distributed computing , scheduling (production processes) , job scheduler , computation , node (physics) , big data , real time computing , operating system , engineering , operations management , structural engineering , algorithm
Data generated from real time information systems are always incremental in nature. Processing of such a huge incremental data in large scale requires a parallel processing system like Hadoop based cluster. Major challenge that arises in all cluster-based system is how efficiently the resources of the system can be used. The research carried out proposes a model architecture for Hadoop cluster with additional components integrated such as super node who manages the clusters computations and a mediation manager who does the performance monitoring and evaluation. Super node in the system is equipped with intelligent or adaptive scheduler that does the scheduling of the job with optimal resources. The scheduler is termed intelligent as it automatically decides which resource to be taken for which computation, with the help of a cross mapping of resource and job with a genetic algorithm which finds the best matching resource. The mediation node deploys ganglia a standard monitoring tool for Hadoop cluster to collect and record the performance parameters of the Hadoop cluster. The system over all does the scheduling of different jobs with optimal usage of resources thus achieving better efficiency compared to the native capacity scheduler in Hadoop. The system is deployed on top of OpenNebula Cloud environment for scalability.