SIGMa
Author(s) -
Simon Lacoste-Julien,
Konstantina Palla,
Alex Davies,
Gjergji Kasneci,
Thore Graepel,
Zoubin Ghahramani
Publication year - 2013
Publication title -
hal (le centre pour la communication scientifique directe)
Language(s) - English
Resource type - Conference proceedings
DOI - 10.1145/2487575.2487592
Subject(s) - computer science , scalability , benchmark (surveying) , greedy algorithm , sigma , matching (statistics) , knowledge graph , theoretical computer science , the internet , artificial intelligence , data mining , information retrieval , machine learning , algorithm , mathematics , world wide web , database , statistics , physics , geodesy , quantum mechanics , geography
International audienceThe Internet has enabled the creation of a growing number of large-scale knowledge bases in a variety of domains containing complementary information. Tools for automatically aligning these knowledge bases would make it possible to unify many sources of structured knowledge and answer complex queries. However, the efficient alignment of large-scale knowledge bases still poses a considerable challenge. Here, we present Simple Greedy Matching (SiGMa), a simple algorithm for aligning knowledge bases with millions of entities and facts. SiGMa is an iterative propagation algorithm that leverages both the structural information from the relationship graph and flexible similarity measures between entity properties in a greedy local search, which makes it scalable. Despite its greedy nature, our experiments indicate that SiGMa can efficiently match some of the world's largest knowledge bases with high accuracy. We provide additional experiments on benchmark datasets which demonstrate that SiGMa can outperform state-of-the-art approaches both in accuracy and efficiency
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom