Premium
On the strong consistency of feature‐weighted k ‐means clustering in a nearmetric space
Author(s) -
Chakraborty Saptarshi,
Das Swagatam
Publication year - 2019
Publication title -
stat
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.61
H-Index - 18
ISSN - 2049-1573
DOI - 10.1002/sta4.227
Subject(s) - cluster analysis , consistency (knowledge bases) , measure (data warehouse) , independent and identically distributed random variables , feature (linguistics) , key (lock) , computer science , function (biology) , mathematics , algorithm , data mining , sampling (signal processing) , feature vector , k means clustering , cluster (spacecraft) , space (punctuation) , pattern recognition (psychology) , artificial intelligence , statistics , random variable , biology , computer vision , programming language , operating system , linguistics , philosophy , computer security , filter (signal processing) , evolutionary biology
Weighted k ‐means ( W K ‐means) is a well‐known method for automated feature weight learning in a conventional k ‐means clustering framework. In this paper, we analytically explore the strong consistency of the W K ‐means algorithm under independent and identically distributed sampling of the data points. The choice of dissimilarity measure plays a key role in data partitioning and detecting the inherent groups existing in a dataset. We propose a proof of strong consistency of the W K ‐means algorithm when the dissimilarity measure used is assumed to be a nearmetric. The proof can be further extended to those dissimilarity measures which are an increasing function of a nearmetric. Through detailed experiments, we demonstrate that W K ‐means‐type algorithms, equipped with a nearmetric, can be pretty effective especially when some of the features are unimportant in revealing the cluster structure of the dataset.