z-logo
open-access-imgOpen Access
A K-means Optimization Algorithm Suitable for Fast Clustering of WebGIS Massive Data
Author(s) -
Hao He,
Bo Sun,
Yan Yang,
Jun Chen
Publication year - 2022
Publication title -
journal of physics. conference series
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.21
H-Index - 85
eISSN - 1742-6596
pISSN - 1742-6588
DOI - 10.1088/1742-6596/2171/1/012069
Subject(s) - cluster analysis , data mining , computer science , cure data clustering algorithm , correlation clustering , stability (learning theory) , grid , k medians clustering , data stream clustering , set (abstract data type) , canopy clustering algorithm , determining the number of clusters in a data set , cluster (spacecraft) , algorithm , mathematics , artificial intelligence , machine learning , geometry , programming language
K-means has the advantage of fast speed and is suitable for clustering large-scale data of WebGIS geographic information. However, due to the random selection of K-means initial clustering centers, the clustering results are unstable and the clustering accuracy is poor. Some current research documents have solved the problems of clustering accuracy and stability, but the clustering time has been greatly increased. The article proposes a grid-based K-means improved algorithm GBK-means, which is based on an adaptive grid method to obtain initial clustering centers. Firstly, the parameters of the grid division are obtained by judging the distribution state of the sample data; then, the interconnected areas of each dense grid are obtained and the cluster centers are obtained, and on this basis, the initial cluster centers are obtained. The experimental results on the real data set of WebGIS show that GBK-means has better clustering effect and faster clustering speed than K-means, K-means++, literature [2], and literature [3]. The average value of its F value, accuracy rate and adjusted Rand coefficient (ARI) is 10.9%, 11% and 11.2% higher than that of K-means. The average clustering time is 75.4%, 52.3%, 85.1%, 91.1% faster than K-means, K-means++, literature [2], and literature [3].

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here