Parallel Incremental 2D-Discretization on Dynamic Datasets | Zendy

Srinivasan  Parthasarathy | Zendy; Arun  Ramakrishnan | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Parallel Incremental 2D-Discretization on Dynamic Datasets

Author(s) -

Srinivasan Parthasarathy,

Arun Ramakrishnan

Publication year - 2002

Language(s) - English

DOI - 10.1109/ipdps.2002.10008

Most current work in data mining assumes that the database is static, and a database update requires rediscovering all the patterns by scanning the entire old and new database. Such approaches can waste a lot of computational and I/O resources, and result in relatively slow response times, to essentially an interactive process. In this paper we address this issue in the context of 2-dimensional discretization within a multiattribute database. Discretization, an important problem in data mining, is typically used to partition the range of continuous attribute(s) into intervals which highlight the behavior of a related discrete attribute. It can be used to build decision trees and to determine appropriate aggregations for On-Line Analytical Processing. We first propose a time-optimal solution to the problem. We then parallelize and incrementalize the algorithm so that it can dynamically maintain the required information even in the presence of data updates without re-executing the algorithm on the entire dataset. Experimental results confirm that our approach results in execution time improvements of up to several orders of magnitude on large datasets.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research