Evaluating entity resolution results
Author(s) -
David Menestrina,
Steven Euijong Whang,
Héctor García-Molina
Publication year - 2010
Publication title -
proceedings of the vldb endowment
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.946
H-Index - 134
ISSN - 2150-8097
DOI - 10.14778/1920841.1920871
Subject(s) - pairwise comparison , measure (data warehouse) , merge (version control) , computer science , cluster analysis , data mining , edit distance , ranking (information retrieval) , algorithm , distance measures , theoretical computer science , mathematics , artificial intelligence , information retrieval
Entity Resolution (ER) is the process of identifying groups of records that refer to the same real-world entity. Various measures (e.g., pairwise F1, cluster F1) have been used for evaluating ER results. However, ER measures tend to be chosen in an ad-hoc fashion without careful thought as to what defines a good result for the specific application at hand. In this paper, our contributions are twofold. First, we conduct an analysis on existing ER measures, showing that they can often conflict with each other by ranking the results of ER algorithms differently. Second, we explore a new distance measure for ER (called "generalized merge distance" or GMD) inspired by the edit distance of strings, using cluster splits and merges as its basic operations. A significant advantage of GMD is that the cost functions for splits and merges can be configured, enabling us to clearly understand the characteristics of a defined GMD measure. Surprisingly, a state-of-the-art clustering measure called Variation of Information is a special case of our configurable GMD measure, and the widely used pairwise F1 measure can be directly computed using GMD. We present an efficient linear-time algorithm that correctly computes the GMD measure for a large class of cost functions that satisfy reasonable properties.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom