
Evaluating Top-N Join Queries with Real-time Entity Resolution
Author(s) -
L. Zhu,
Yafang Cheng,
Yu Wei,
Qin Ma,
Weiyi Meng
Publication year - 2020
Publication title -
journal of physics. conference series
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.21
H-Index - 85
eISSN - 1742-6596
pISSN - 1742-6588
DOI - 10.1088/1742-6596/1575/1/012084
Subject(s) - tuple , computer science , ranking (information retrieval) , join (topology) , data mining , set (abstract data type) , on the fly , measure (data warehouse) , resolution (logic) , query optimization , information retrieval , database , mathematics , artificial intelligence , programming language , combinatorics , discrete mathematics , operating system
Using nonmonotone ranking functions in top- N queries is a challenge. Traditional techniques for top- N queries are based on clean data without entity resolution (ER). For dirty datasets with duplicate tuples referring to the same real-world entity, these techniques may yield top- N tuples duplicates for a query. Consequently, the effective size of the result set of the query is less than N , and some useful tuples may fail to be retrieved from the datasets, which leads to poor effectiveness. Using an ER-Index based on a divide-and-conquer mechanism and nonmonotone ranking functions, in this paper, we propose a method for processing top- N join queries with real-time ER. This method integrates ER with the processing of a top- N join query over dirty datasets on the fly. Extensive experiments are conducted to measure the effectiveness and efficiency of the method over dirty datasets.