Open Access
STCLARanS: An Improved Clustering Large Applications based on Randomized Search Algorithm using Slim-tree Technique
Author(s) -
Ricardo Q. Camungao
Publication year - 2019
Publication title -
international journal of recent technology and engineering
Language(s) - English
Resource type - Journals
ISSN - 2277-3878
DOI - 10.35940/ijrte.b1022.078219
Subject(s) - cluster analysis , computer science , data mining , cure data clustering algorithm , correlation clustering , medoid , canopy clustering algorithm , data stream clustering , tree (set theory) , algorithm , fuzzy clustering , single linkage clustering , artificial intelligence , pattern recognition (psychology) , mathematics , mathematical analysis
Clustering has been used for data interpretation when dealing with large database in the fields of medicines, business, engineering etc. for the recent years. Its existence paved way on the development of data mining techniques like CLARANS (Clustering Large Applications based on Randomized Search) Algorithm. It is the most efficient k-medoids technique that uses randomized strategy to identify the best medoids in a large dataset. Likewise, it surpasses the clustering performance of both PAM (Partitioning Around Medoids) and CLARA (Clustering Large Applications) in terms of time. This paper addresses the task of integrating Slim-tree method to CLARANS for the development of the proposed Slim-tree Clustering Large Applications based on Randomized Search (STCLARanS) Algorithm and an experimental evaluation was prepared using synthetic and real datasets for the comparison of the quality of the clustered output of the CLARANS and the proposed STCLARanS algorithms. The Slim-tree method is used for pre-clustering of the objects in the dataset in identifying the objects in the middle level as the sample objects used to start the clustering process. The proposed Algorithm assumes that with the new sampling strategy to draw the initial cluster centers to start the clustering process may yield to better quality of the clustered outputs as compared to the clustered output of the CLARANS algorithm. The quality of the clustered output is measured on the accumulated distances of the objects to their cluster centers.