Premium
A learning to rank approach for quality‐aware pseudo‐relevance feedback
Author(s) -
Ye Zheng,
Huang Jimmy Xiangji
Publication year - 2016
Publication title -
journal of the association for information science and technology
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.903
H-Index - 145
eISSN - 2330-1643
pISSN - 2330-1635
DOI - 10.1002/asi.23430
Subject(s) - relevance feedback , computer science , relevance (law) , information retrieval , set (abstract data type) , process (computing) , quality (philosophy) , rank (graph theory) , learning to rank , artificial intelligence , ranking (information retrieval) , mathematics , philosophy , operating system , epistemology , combinatorics , political science , law , image (mathematics) , programming language , image retrieval
Pseudo relevance feedback ( PRF ) has shown to be effective in ad hoc information retrieval. In traditional PRF methods, top‐ranked documents are all assumed to be relevant and therefore treated equally in the feedback process. However, the performance gain brought by each document is different as showed in our preliminary experiments. Thus, it is more reasonable to predict the performance gain brought by each candidate feedback document in the process of PRF . We define the quality level ( QL ) and then use this information to adjust the weights of feedback terms in these documents. Unlike previous work, we do not make any explicit relevance assumption and we go beyond just selecting “good” documents for PRF . We propose a quality‐based PRF framework, in which two quality‐based assumptions are introduced. Particularly, two different strategies, relevance‐based QL ( RelPRF ) and improvement‐based QL ( ImpPRF ) are presented to estimate the QL of each feedback document. Based on this, we select a set of heterogeneous document‐level features and apply a learning approach to evaluate the QL of each feedback document. Extensive experiments on standard TREC (Text REtrieval Conference) test collections show that our proposed model performs robustly and outperforms strong baselines significantly.