
Performance Improvement of Clustering Affinity Propagation Method using Principal Component Analysis
Author(s) -
Jasael Simanullang,
Muhammad Zarlis,
Elviawaty Muisa Zamzami
Publication year - 2020
Publication title -
journal of physics. conference series
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.21
H-Index - 85
eISSN - 1742-6596
pISSN - 1742-6588
DOI - 10.1088/1742-6596/1566/1/012126
Subject(s) - principal component analysis , cluster analysis , affinity propagation , determining the number of clusters in a data set , cluster (spacecraft) , computer science , pattern recognition (psychology) , mathematics , data mining , artificial intelligence , correlation clustering , cure data clustering algorithm , programming language
Affinity Propagation Method it is necessary to modify the algorithm by using Principal Component Analysis (PCA). PCA method is used to reduce the attributes or characteristics that are less influential on the data so that the most influential attributes are obtained to then be carried out the clustering process with Affinity Propagation. The comparison results of the PCA + AP grouping model have better performance than the conventional AP grouping model. This is justified because the number of iterations and clusters produced by the PCA + AP clustering model does not change and converges when there are 8 optimal cluster clusters. While the performance of conventional clustering models produces an optimal number of clusters from 14 clusters with a significant number of iterations. So it can be concluded that the PCA + AP grouping model is suitable for the Air Quality dataset because it produces an optimal number of clusters and iterations of 8 clusters. The comparison results of the PCA + AP grouping model have better performance than the conventional AP grouping model. This is justified because the number of iterations and clusters produced by the PCA + AP clustering model does not change and converges when the optimal number of clusters is 5 clusters. While the performance of conventional clustering models produces a suboptimal number of 10 clusters with a significant number of iterations. So it can be concluded that the PCA + AP grouping model is suitable for the Water Quality Status dataset because it produces an optimal number of clusters and 5 cluster repetitions.