
Record linkage and deduplication using traditional blocking
Author(s) -
G. Somasekhar,
K SeshaSravani,
P Keerthi,
Sai Sandeep G
Publication year - 2017
Publication title -
international journal of engineering and technology
Language(s) - English
Resource type - Journals
ISSN - 2227-524X
DOI - 10.14419/ijet.v7i1.1.9705
Subject(s) - data deduplication , record linkage , computer science , matching (statistics) , blocking (statistics) , search engine indexing , database , linkage (software) , data mining , information retrieval , mathematics , computer network , population , biochemistry , statistics , demography , sociology , gene , chemistry
Record Linkage and Deduplication are the two process that are used in matching records. Matching of records is done to remove the duplicate records. These duplicate records highly influence the outputs of data mining and data processing. If the matching of records is done on the single database, it is called Deduplication. In Deduplication we check for the duplicate records in the single database. Unlike deduplication if the matching of the records is done on the several databases it is called as record linkage. In this paper we also discuss about the indexing technique called as traditional blocking which is used to remove non matching pairs that leads to the less number of record pair to be compared.