
Result diversification based on query‐specific cluster ranking
Author(s) -
He Jiyin,
Meij Edgar,
de Rijke Maarten
Publication year - 2011
Publication title -
journal of the american society for information science and technology
Language(s) - English
Resource type - Journals
eISSN - 1532-2890
pISSN - 1532-2882
DOI - 10.1002/asi.21468
Subject(s) - diversification (marketing strategy) , computer science , ranking (information retrieval) , cluster analysis , information retrieval , cluster (spacecraft) , result set , data mining , set (abstract data type) , artificial intelligence , business , marketing , programming language
Result diversification is a retrieval strategy for dealing with ambiguous or multi‐faceted queries by providing documents that cover as many facets of the query as possible. We propose a result diversification framework based on query‐specific clustering and cluster ranking, in which diversification is restricted to documents belonging to clusters that potentially contain a high percentage of relevant documents. Empirical results show that the proposed framework improves the performance of several existing diversification methods. The framework also gives rise to a simple yet effective cluster‐based approach to result diversification that selects documents from different clusters to be included in a ranked list in a round robin fashion. We describe a set of experiments aimed at thoroughly analyzing the behavior of the two main components of the proposed diversification framework, ranking and selecting clusters for diversification. Both components have a crucial impact on the overall performance of our framework, but ranking clusters plays a more important role than selecting clusters. We also examine properties that clusters should have in order for our diversification framework to be effective. Most relevant documents should be contained in a small number of high‐quality clusters, while there should be no dominantly large clusters. Also, documents from these high‐quality clusters should have a diverse content. These properties are strongly correlated with the overall performance of the proposed diversification framework.