
Web document clustering using hyperlink structures
Author(s) -
Xiaofeng He,
Hongyuan Zha,
Chris Ding,
Horst D. Simon
Publication year - 2001
Language(s) - English
Resource type - Reports
DOI - 10.2172/815474
Subject(s) - hyperlink , computer science , cluster analysis , information retrieval , document clustering , world wide web , context (archaeology) , web page , data mining , artificial intelligence , geography , archaeology
With the exponential growth of information on the World Wide Web there is great demand for developing efficient and effective methods for organizing and retrieving the information available. Document clustering plays an important role in information retrieval and taxonomy management for the World Wide Web and remains an interesting and challenging problem in the field of web computing. In this paper we consider document clustering methods exploring textual information hyperlink structure and co-citation relations. In particular we apply the normalized cut clustering method developed in computer vision to the task of hyperdocument clustering. We also explore some theoretical connections of the normalized-cut method to K-means method. We then experiment with normalized-cut method in the context of clustering query result sets for web search engines