
A Novel Clustering Algorithm to Process Big Data Using Hadoop Framework
Author(s) -
Mrs. D. Jayalatchumy*,
P. Thambidurai,
Mr. D. Kadhirvelu
Publication year - 2019
Publication title -
international journal of engineering and advanced technology
Language(s) - English
Resource type - Journals
ISSN - 2249-8958
DOI - 10.35940/ijeat.f8874.088619
Subject(s) - cluster analysis , computer science , big data , data mining , scalability , process (computing) , data stream clustering , cure data clustering algorithm , algorithm , correlation clustering , machine learning , database , operating system
The real challenge for data miners lies in extracting useful information from huge datasets. Moreover, choosing an efficient algorithm to analyze and process these unstructured data is itself a challenge. Cluster analysis is an unsupervised practice to attain data insight in the era of Big Data. Hyperflated PIC is a Big Data processing solution designed to take advantage over clustering. It is a scalable efficient algorithm to address the shortcomings of existing clustering algorithm and it can process huge datasets quickly. HPIC algorithms have been validated by experimenting them with synthetic and real datasets using different evaluation measure. The quality of clustering results has also been analyzed and proved to be highly efficient and suitable for Big Data processing.