
Evolutionary clustering framework based on distance matrix for arbitrary‐shaped data sets
Author(s) -
Liu Cong,
Wu Chunxue,
Jiang Linhua
Publication year - 2016
Publication title -
iet signal processing
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.384
H-Index - 42
eISSN - 1751-9683
pISSN - 1751-9675
DOI - 10.1049/iet-spr.2015.0335
Subject(s) - cluster analysis , k medians clustering , correlation clustering , complete linkage clustering , single linkage clustering , distance matrix , cure data clustering algorithm , fuzzy clustering , euclidean distance , computer science , metric (unit) , clustering high dimensional data , determining the number of clusters in a data set , data mining , consensus clustering , mathematics , constrained clustering , pattern recognition (psychology) , data stream clustering , artificial intelligence , algorithm , operations management , economics
Data clustering plays a key role in both scientific and real‐world applications. However, current clustering methods still face some challenges such as clustering arbitrary‐shaped data sets and detecting the cluster number automatically. This study addresses the two challenges. A novel clustering analysis method, named automatic evolutionary clustering method based on distance (AED) matrix, is proposed to determine the proper cluster number automatically, and to find the optimal clustering result as well. In AED, a distance matrix is first obtained by using a specific distance metric such as Euclidean distance metric or path distance metric, and then this distance matrix is partitioned by an evolutionary clustering framework. In this framework, a fixed‐length representation scheme is implemented to represent the clustering result, a novel cross‐over scheme is introduced to increase the convergence speed, and a validity index is proposed to evaluate the intermediate clustering results and the final clustering results. AED is systematically compared with some state‐of‐the‐art clustering methods on both hyper‐spherical and irregular‐shaped data sets, and the experimental results suggest that the authors approach not only successfully detects the correct cluster numbers but also achieves better accuracy for most of test problems.