z-logo
open-access-imgOpen Access
Evaluating Top-N Join Queries with Real-time Entity Resolution
Author(s) -
L. Zhu,
Yafang Cheng,
Yu Wei,
Qin Ma,
Weiyi Meng
Publication year - 2020
Publication title -
journal of physics. conference series
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.21
H-Index - 85
eISSN - 1742-6596
pISSN - 1742-6588
DOI - 10.1088/1742-6596/1575/1/012084
Subject(s) - tuple , computer science , ranking (information retrieval) , join (topology) , data mining , set (abstract data type) , on the fly , measure (data warehouse) , resolution (logic) , query optimization , information retrieval , database , mathematics , artificial intelligence , programming language , combinatorics , discrete mathematics , operating system
Using nonmonotone ranking functions in top- N queries is a challenge. Traditional techniques for top- N queries are based on clean data without entity resolution (ER). For dirty datasets with duplicate tuples referring to the same real-world entity, these techniques may yield top- N tuples duplicates for a query. Consequently, the effective size of the result set of the query is less than N , and some useful tuples may fail to be retrieved from the datasets, which leads to poor effectiveness. Using an ER-Index based on a divide-and-conquer mechanism and nonmonotone ranking functions, in this paper, we propose a method for processing top- N join queries with real-time ER. This method integrates ER with the processing of a top- N join query over dirty datasets on the fly. Extensive experiments are conducted to measure the effectiveness and efficiency of the method over dirty datasets.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here