
A Novel Improved K-Means Algorithm Based on Parameter Adaptive Selection
Author(s) -
Xiaodi Huang,
Minglun Ren,
Xiaoxi Zhu
Publication year - 2020
Publication title -
journal of physics. conference series
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.21
H-Index - 85
eISSN - 1742-6596
pISSN - 1742-6588
DOI - 10.1088/1742-6596/1549/4/042005
Subject(s) - cluster analysis , computer science , convergence (economics) , stability (learning theory) , particle swarm optimization , algorithm , metric (unit) , sensitivity (control systems) , set (abstract data type) , data mining , cluster (spacecraft) , correlation clustering , k means clustering , rate of convergence , process (computing) , artificial intelligence , machine learning , key (lock) , engineering , operations management , computer security , electronic engineering , economics , programming language , economic growth , operating system
As a classical clustering algorithm, K-means has been widely applied due to its features of simple mathematical thinking, fast convergence rate, less complexity, and easy to implementation. However, K-means algorithm always requires users to set the desired number of clusters in advance, and the initial cluster centers are usually generated in a random way. When dealing with unknown datasets that users do not have enough domain-assisted knowledge, such parameters setting strategies not only increases the burden on users, but also makes clustering quality difficult to guarantee. Therefore, in view of the high sensitivity of K-means clustering process to initial parameters, this paper propose an improved DDWK-means (Distance-Density-Weight K-means) algorithm. Based on the distance-density feature and the method of inertia weight of particle swarm optimization algorithm, the optimal initial cluster centers not only can be determined adaptively according to the structural characteristics of the dataset itself without introducing artificial parameters, but also can be adjusted dynamically due to the threshold change of clustering quality metric. We make an experimental study with five standard datasets from UCI (University of California Irvine), and the results indicate that the DDWK-means algorithm exhibits a significantly improvement in clustering efficiency and stability.