z-logo
open-access-imgOpen Access
Hierarchical model-based clustering of large datasets through fractionation and refractionation
Author(s) -
Jeremy Tantrum,
Alejandro Murua,
Werner Stuetzle
Publication year - 2002
Publication title -
citeseer x (the pennsylvania state university)
Language(s) - English
Resource type - Conference proceedings
DOI - 10.1145/775047.775074
Subject(s) - cluster analysis , computer science , hierarchical clustering , single linkage clustering , parametric statistics , data mining , correlation clustering , cure data clustering algorithm , artificial intelligence , mathematics , statistics
The goal of clustering is to identify distinct groups in a dataset. Compared to non-parametric clustering methods like complete linkage, hierarchical model-based clustering has the advantage of offering a way to estimate the number of groups present in the data. However, its computational cost is quadratic in the number of items to be clustered, and it is therefore not applicable to large problems. We review an idea called Fractionation, originally conceived by Cutting, Karger, Pedersen and Tukey for non-parametric hierarchical clustering of large datasets, and describe an adaptation of Fractionation to model-based clustering. A further extension, called Refractionation, leads to a procedure that can be successful even in the difficult situation where there are large numbers of small groups.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom