Premium
Constrained overlapping clusters: minimizing the negative effects of bridge‐nodes
Author(s) -
Scripps Jerry,
Tan PangNing
Publication year - 2010
Publication title -
statistical analysis and data mining: the asa data science journal
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.381
H-Index - 33
eISSN - 1932-1872
pISSN - 1932-1864
DOI - 10.1002/sam.10066
Subject(s) - cluster analysis , computer science , cluster (spacecraft) , bridge (graph theory) , key (lock) , graph , genetic algorithm , data mining , theoretical computer science , identification (biology) , algorithm , artificial intelligence , machine learning , medicine , botany , computer security , programming language , biology
This paper presents a new approach to form overlapping clusters of objects by balancing the effects of incompleteness, impurity, and overlap. Incompleteness results from similar objects separated into different clusters while impurity arises when a cluster contains dissimilar objects. Overlap is caused by nodes that appear in more than one cluster. The key to balancing these effects is the identification of bridge‐nodes. We show the limitations of traditional clustering algorithms in handling bridge‐nodes and demonstrate the intractability of minimizing all three effects. Approximation algorithms based on graph mincut and genetic algorithm are proposed to minimize these effects. Our results with real data sets show significant improvement over traditional methods with regard to incompleteness, impurity, and overlap. Copyright © 2009 Wiley Periodicals, Inc. Statistical Analysis and Data Mining 3: 20‐37, 2010