z-logo
open-access-imgOpen Access
Automatically Utilizing Secondary Sources to Align Information Across Sources
Author(s) -
Michalowski Martin,
Thakkar Snehal,
Knoblock Craig A.
Publication year - 2005
Publication title -
ai magazine
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.597
H-Index - 79
eISSN - 2371-9621
pISSN - 0738-4602
DOI - 10.1609/aimag.v26i1.1797
Subject(s) - computer science , exploit , linkage (software) , record linkage , xml , linked data , information retrieval , data integration , information integration , world wide web , process (computing) , disparate system , data source , database , semantic web , data mining , population , biochemistry , chemistry , demography , computer security , sociology , gene , operating system
XML, web services, and the semantic web have opened the door for new and exciting information‐integration applications. Information sources on the web are controlled by different organizations or people, utilize different text formats, and have varying inconsistencies. Therefore, any system that integrates information from different data sources must identify common entities from these sources. Data from many data sources on the web does not contain enough information to link the records accurately using state‐of‐the‐art record‐linkage systems. However, it is possible to exploit secondary data sources on the web to improve the record‐linkage process. We present an approach to accurately and automatically match entities from various data sources by utilizing a state‐of‐the‐art record‐linkage system in conjunction with a data‐integration system. The data‐integration system is able to automatically determine which secondary sources need to be queried when linking records from various data sources. In turn, the record‐linkage system is then able to utilize this additional information to improve the accuracy of the linkage between datasets.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here