z-logo
open-access-imgOpen Access
Pattern Sampling in Distributed Databases
Author(s) -
Lamine Diop,
Cheikh Talibouya Diop,
Arnaud Giacometti,
Arnaud Soulet
Publication year - 2020
Publication title -
lecture notes in computer science
Language(s) - English
Resource type - Book series
SCImago Journal Rank - 0.249
H-Index - 400
eISSN - 1611-3349
pISSN - 0302-9743
DOI - 10.1007/978-3-030-54832-2_7
Subject(s) - computer science , soundness , data mining , robustness (evolution) , outlier , benchmark (surveying) , distributed database , database , artificial intelligence , biochemistry , chemistry , geodesy , gene , programming language , geography
Many applications rely on distributed databases. However, only few discovery methods exist to extract patterns without centralizing the data. In fact, this centralization is often less expensive than the communication of extracted patterns from the different nodes. To circumvent this difficulty, this paper revisits the problem of pattern mining in distributed databases by benefiting from pattern sampling. Specifically, we propose the algorithm DDSampling that randomly draws a pattern from a distributed database with a probability proportional to its interest. We demonstrate the soundness of DDSampling and analyze its time complexity. Finally, experiments on benchmark datasets highlight its low communication cost and its robustness. We also illustrate its interest on real-world data from the Semantic Web for detecting outlier entities in DBpedia and Wikidata.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom