
Research on K-means Algorithm Optimization based on Compression Learning
Author(s) -
Shuai Cai,
Lei Zhu,
Weijun Zeng,
Re Yu,
Zhao Xiao
Publication year - 2019
Publication title -
iop conference series. materials science and engineering
Language(s) - English
Resource type - Journals
eISSN - 1757-899X
pISSN - 1757-8981
DOI - 10.1088/1757-899x/569/5/052038
Subject(s) - cluster analysis , algorithm , sketch , computer science , data compression , canopy clustering algorithm , cure data clustering algorithm , set (abstract data type) , dimension (graph theory) , data set , centroid , ramer–douglas–peucker algorithm , population based incremental learning , k means clustering , matching (statistics) , computational complexity theory , compression (physics) , correlation clustering , mathematics , artificial intelligence , machine learning , genetic algorithm , statistics , materials science , pure mathematics , composite material , programming language
The K-means algorithm is one of the classical algorithms of clustering. However, as the data set increases, the computational cost of clustering becomes higher. The orthogonal matching pursuit algorithm is a classic signal reconstruction algorithm. The paper improves its algorithm based on compression learning and applies it to the K-means algorithm, which uses the sketch of the original data set to estimate the cluster center. The experiment results show that the clustering effect of this method is similar to that of K-means algorithm, because the size of the sketch is independent of the size of the original data set, only related to the number of centroids K and the dimension n of the data, which reduces the computational complexity of the algorithm. For large data sets, experiments show that the improved algorithm is more optimized than the traditional K-means algorithm.