Active Sampling for Constrained Clustering
Author(s) -
Masayuki Okabe,
Seiji Yamada
Publication year - 2014
Publication title -
journal of advanced computational intelligence and intelligent informatics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.172
H-Index - 20
eISSN - 1343-0130
pISSN - 1883-8014
DOI - 10.20965/jaciii.2014.p0232
Subject(s) - cluster analysis , computer science , constrained clustering , correlation clustering , data mining , cure data clustering algorithm , sampling (signal processing) , k medians clustering , data stream clustering , set (abstract data type) , fuzzy clustering , determining the number of clusters in a data set , canopy clustering algorithm , process (computing) , artificial intelligence , machine learning , programming language , operating system , filter (signal processing) , computer vision
Constrained clustering is a framework for improving clustering performance by using constraints about data pairs. Since performance of constrained clustering depends on the set of constraints used, a method is needed to select good constraints that promote clustering performance. In this paper, we propose an active sampling method working with a constrained cluster ensemble algorithm that aggregates clustering results that a modified COP-Kmeans iteratively produces by changing the priorities of constraints. Our method follows the approach of uncertainty sampling and measures uncertainty using variations of clustering results where data pairs are clustered together in some results but not in others. It selects the data pair to be labeled that has the most variable result during cluster ensemble process. Experimental results show that our method outperforms random sampling. We further investigate the effect of important parameters.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom