z-logo
Premium
Estimating the number of clusters
Author(s) -
Cuevas Antonio,
Febrero Manuel,
Fraiman Ricardo
Publication year - 2000
Publication title -
canadian journal of statistics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.804
H-Index - 51
eISSN - 1708-945X
pISSN - 0319-5724
DOI - 10.2307/3315985
Subject(s) - estimator , nonparametric statistics , computation , mathematics , constant (computer programming) , random variate , cluster (spacecraft) , set (abstract data type) , function (biology) , population , data set , algorithm , probability density function , statistics , computer science , combinatorics , random variable , demography , sociology , evolutionary biology , biology , programming language
Hartigan (1975) defines the number q of clusters in a d ‐variate statistical population as the number of connected components of the set { f > c}, where f denotes the underlying density function on R d and c is a given constant. Some usual cluster algorithms treat q as an input which must be given in advance. The authors propose a method for estimating this parameter which is based on the computation of the number of connected components of an estimate of { f > c}. This set estimator is constructed as a union of balls with centres at an appropriate subsample which is selected via a nonparametric density estimator of f . The asymptotic behaviour of the proposed method is analyzed. A simulation study and an example with real data are also included.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom