Redblock: a tool for online deduplication on large datasets
Author(s) -
Luan Félix Pimentel,
Igor Lemos Vicente,
Guilherme Dal Bianco
Publication year - 2017
Publication title -
revista brasileira de computação aplicada
Language(s) - English
Resource type - Journals
ISSN - 2176-6649
DOI - 10.5335/rbca.v9i2.7143
Subject(s) - data deduplication , computer science , process (computing) , data mining , database , operating system
Online data deduplication aims to identify records that represent the same purpose on a continuous data flow environment. It must be able to process a range of information with high effectiveness and no delays. The purpose of this paper is to introduce a developed tool entitled Redblock, for real-time data deduplication, using a distributed platform for online processing combined with an Inverted Index. During the experimental evaluation, Redblock managed to provide good preliminary results in terms of efficiency and effectiveness in a database.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom