z-logo
Premium
Determining the Number of Clusters Using the Weighted Gap Statistic
Author(s) -
Yan Mingjin,
Ye Keying
Publication year - 2007
Publication title -
biometrics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 2.298
H-Index - 130
eISSN - 1541-0420
pISSN - 0006-341X
DOI - 10.1111/j.1541-0420.2007.00784.x
Subject(s) - statistic , statistics , computer science , mathematics , econometrics
Summary Estimating the number of clusters in a data set is a crucial step in cluster analysis. In this article, motivated by the gap method (Tibshirani, Walther, and Hastie, 2001, Journal of the Royal Statistical Society B 63, 411–423), we propose the weighted gap and the difference of difference‐weighted (DD‐weighted) gap methods for estimating the number of clusters in data using the weighted within‐clusters sum of errors: a measure of the within‐clusters homogeneity. In addition, we propose a “multilayer” clustering approach, which is shown to be more accurate than the original gap method, particularly in detecting the nested cluster structure of the data. The methods are applicable when the input data contain continuous measurements and can be used with any clustering method. Simulation studies and real data are investigated and compared among these proposed methods as well as with the original gap method.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here