Querying Very Large Multi-dimensional Datasets in ADR | Zendy

Tahsin M. Kurç | Zendy; Chialin  Chang | Zendy; Renato  Ferreira | Zendy; Alan  Sussman | Zendy; Joel H. Saltz | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Querying Very Large Multi-dimensional Datasets in ADR

Author(s) -

Tahsin M. Kurç,

Chialin Chang,

Renato Ferreira,

Alan Sussman,

Joel H. Saltz

Publication year - 1999

Language(s) - English

DOI - 10.1109/sc.1999.10046

Analysis and processing of very large multi-dimensional scientific datasets (i.e. where data items are associated with points in a multi-dimensional attribute space) is an important component of science and engineering. Moreover, an increasing number of applications make use of very large multi-dimensional datasets. Examples of such datasets include raw and processed sensor data from satellites [12], output from hydrodynamics and chemical transport simulations [10], and archives of medical images [1]. Many applications that make use of multi-dimensional datasets have several important characteristics. Both the input and the output are often disk-resident datasets. Applications may use only a subset of all the data available in input and output datasets. Access to data items is described by a range query, namely a multi-dimensionalbounding box in the underlying multi-dimensionalattribute space of the dataset. Only the data items whose associated coordinates fall within the multi-dimensional box are retrieved. The processing structures of these applications also share common characteristics. Figure 1 shows high-level pseudo-code for the basic processing loop in these applications. The processing steps consist of retrieving input and output data items that intersect the range query (steps 1–2 and 4–5), mapping the coordinates of the retrieved input items to the corresponding output items (step 6), and aggregating, in some way, all the retrieved input items mapped to the same output data items (steps 7–8). Correctness of the output usually does not depend on the order input data items are aggregated. The mapping function, Map(ie), maps an input item to a set of output items. An intermediate data structure, referred to as an accumulator, is used to hold intermediate results during processing. For example, an accumulator can be used to keep a running sum for an averaging operation. The aggregation function, Aggregate(ie; ae), aggregates the value of an input item with the intermediate result stored in the accumulator element (ae). The output dataset from a query is usually much smaller than the input dataset, hence steps 4–8 are called the reduction phase of the processing. Accumulator elements are allocated and initialized (step 3) before the reduction phase. Another constraint is that there is This research was supported by the National Science Foundation under Grants #BIR9318183 and #ACI-9619020 (UC Subcontract # 10152408), and the Office of Naval Research under Grant #N6600197C8534. The Maryland IBM SP2 used for the experiments was provided by NSF CISE Institutional Infrastructure Award #CDA9401151 and a grant from IBM.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research