z-logo
Premium
Estimating the number of clusters
Author(s) -
Cuevas Antonio,
Febrero Manuel,
Fraiman Ricardo
Publication year - 2000
Publication title -
canadian journal of statistics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.804
H-Index - 51
eISSN - 1708-945X
pISSN - 0319-5724
DOI - 10.2307/3315985
Subject(s) - estimator , nonparametric statistics , computation , mathematics , constant (computer programming) , random variate , cluster (spacecraft) , set (abstract data type) , function (biology) , population , data set , algorithm , probability density function , statistics , computer science , combinatorics , random variable , demography , sociology , evolutionary biology , biology , programming language
Hartigan (1975) defines the number q of clusters in a d ‐variate statistical population as the number of connected components of the set { f > c}, where f denotes the underlying density function on R d and c is a given constant. Some usual cluster algorithms treat q as an input which must be given in advance. The authors propose a method for estimating this parameter which is based on the computation of the number of connected components of an estimate of { f > c}. This set estimator is constructed as a union of balls with centres at an appropriate subsample which is selected via a nonparametric density estimator of f . The asymptotic behaviour of the proposed method is analyzed. A simulation study and an example with real data are also included.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here