z-logo
open-access-imgOpen Access
Restrictive clustering and metaclustering for self-organizing document collections
Author(s) -
Stefan Siersdorfer,
Sergej Sizov
Publication year - 2004
Publication title -
citeseer x (the pennsylvania state university)
Language(s) - English
Resource type - Conference proceedings
ISBN - 1-58113-881-4
DOI - 10.1145/1008992.1009032
Subject(s) - computer science , cluster analysis , structuring , document clustering , information retrieval , data mining , cluster (spacecraft) , machine learning , artificial intelligence , finance , economics , programming language
This paper addresses the problem of automatically structuring heterogenous document collections by using clustering methods. In contrast to traditional clustering, we study restrictive methods and ensemble-based meta methods that may decide to leave out some documents rather than assigning them to inappropriate clusters with low confidence. These techniques result in higher cluster purity, better overall accuracy, and make unsupervised self-organization more robust. Our comprehensive experimental studies on three different real-world data collections demonstrate these benefits. The proposed methods seem particularly suitable for automatically substructuring personal email folders or personal Web directories that are populated by focused crawlers, and they can be combined with supervised classification techniques.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom