Premium
The twist measure for IR evaluation: Taking user's effort into account
Author(s) -
Ferro Nicola,
Silvello Gianmaria,
Keskustalo Heikki,
Pirkola Ari,
Järvelin Kalervo
Publication year - 2016
Publication title -
journal of the association for information science and technology
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.903
H-Index - 145
eISSN - 2330-1643
pISSN - 2330-1635
DOI - 10.1002/asi.23416
Subject(s) - computer science , grasp , ranking (information retrieval) , focus (optics) , relevance (law) , information retrieval , measure (data warehouse) , complement (music) , point (geometry) , parsing , data science , data mining , artificial intelligence , software engineering , physics , geometry , law , political science , optics , gene , phenotype , biochemistry , chemistry , mathematics , complementation
We present a novel measure for ranking evaluation, called Twist (τ). It is a measure for informational intents, which handles both binary and graded relevance. τ stems from the observation that searching is currently a that searching is currently taken for granted and it is natural for users to assume that search engines are available and work well. As a consequence, users may assume the utility they have in finding relevant documents, which is the focus of traditional measures, as granted. On the contrary, they may feel uneasy when the system returns nonrelevant documents because they are then forced to do additional work to get the desired information, and this causes avoidable effort. The latter is the focus of τ, which evaluates the effectiveness of a system from the point of view of the effort required to the users to retrieve the desired information. We provide a formal definition of τ, a demonstration of its properties, and introduce the notion of effort/gain plots, which complement traditional utility‐based measures. By means of an extensive experimental evaluation, τ is shown to grasp different aspects of system performances, to not require extensive and costly assessments, and to be a robust tool for detecting differences between systems.