z-logo
open-access-imgOpen Access
Parallel Incremental 2D-Discretization on Dynamic Datasets
Author(s) -
Srinivasan Parthasarathy,
Arun Ramakrishnan
Publication year - 2002
Language(s) - English
DOI - 10.1109/ipdps.2002.10008
Most current work in data mining assumes that the database is static, and a database update requires rediscovering all the patterns by scanning the entire old and new database. Such approaches can waste a lot of computational and I/O resources, and result in relatively slow response times, to essentially an interactive process. In this paper we address this issue in the context of 2-dimensional discretization within a multiattribute database. Discretization, an important problem in data mining, is typically used to partition the range of continuous attribute(s) into intervals which highlight the behavior of a related discrete attribute. It can be used to build decision trees and to determine appropriate aggregations for On-Line Analytical Processing. We first propose a time-optimal solution to the problem. We then parallelize and incrementalize the algorithm so that it can dynamically maintain the required information even in the presence of data updates without re-executing the algorithm on the entire dataset. Experimental results confirm that our approach results in execution time improvements of up to several orders of magnitude on large datasets.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom