z-logo
Premium
CLUSTERING VERY LARGE DATA SETS USING A LOW MEMORY MATRIX FACTORED REPRESENTATION
Author(s) -
Littau David,
Boley Daniel
Publication year - 2009
Publication title -
computational intelligence
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.353
H-Index - 52
eISSN - 1467-8640
pISSN - 0824-7935
DOI - 10.1111/j.1467-8640.2009.00331.x
Subject(s) - cluster analysis , computer science , algorithm , representation (politics) , data matrix , cluster (spacecraft) , sample (material) , matrix (chemical analysis) , scalability , single linkage clustering , pattern recognition (psychology) , mathematics , data mining , cure data clustering algorithm , fuzzy clustering , artificial intelligence , clade , biochemistry , chemistry , materials science , chromatography , database , politics , political science , law , composite material , gene , programming language , phylogenetic tree
A scalable method to cluster data sets too large to fit in memory is presented. This method does not depend on random subsampling, but does scan every individual data sample in a deterministic way. The original data are represented in factored form by the product of two matrices, one or both of which is very sparse. This factored form avoids the need to multiply together these two matrices by using a variant of the Principal Direction Divisive Partitioning (PDDP) algorithm which does not depend on computing the distances between the individual samples. The resulting clustering algorithm is Piecemeal PDDP (PMPDDP), in which the original data are broken up into sections which will fit into memory and clustered. The cluster centers are used to create approximations to the original data items, and each original data item is represented by a linear combination of these centers. We evaluate the performance of PMPDDP on three real data sets, and observe that the quality of the clusters of PMPDDP is comparable to PDDP for the data sets examined.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here