Premium
Soft clustering for information retrieval applications
Author(s) -
Bordogna Gloria,
Pasi Gabriella
Publication year - 2011
Publication title -
wiley interdisciplinary reviews: data mining and knowledge discovery
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.506
H-Index - 47
eISSN - 1942-4795
pISSN - 1942-4787
DOI - 10.1002/widm.3
Subject(s) - cluster analysis , computer science , fuzzy clustering , data mining , soft computing , information retrieval , document clustering , biclustering , brown clustering , probabilistic logic , hierarchical clustering , correlation clustering , fuzzy logic , artificial intelligence , canopy clustering algorithm , machine learning
This paper overviews soft clustering algorithms applied in the context of information retrieval (IR). First, a motivation of the utility of soft clustering approaches in IR is discussed. Then, an outline of the two main flat soft approaches, namely probabilistic clustering and fuzzy clustering, is described. Specifically, the expectation maximization and fuzzy c‐means algorithms are introduced, and some of their extensions defined to overcome their main drawbacks when applied for organizing large document collections. Finally, soft hierarchical clustering algorithms designed for generating taxonomies of documents are introduced. © 2011 John Wiley & Sons, Inc. WIREs Data Mining Knowl Discov 2011 1 138‐146 DOI: 10.1002/widm.3 This article is categorized under: Algorithmic Development > Hierarchies and Trees Fundamental Concepts of Data and Knowledge > Information Repositories Technologies > Computational Intelligence Technologies > Structure Discovery and Clustering