z-logo
open-access-imgOpen Access
On the analysis of indexing schemes
Author(s) -
Joseph M. Hellerstein,
Ηλίας Κουτσουπιάς,
Christos H. Papadimitriou
Publication year - 1997
Publication title -
citeseer x (the pennsylvania state university)
Language(s) - English
Resource type - Conference proceedings
ISBN - 0-89791-910-6
DOI - 10.1145/263661.263688
Subject(s) - computer science , citation , library science , search engine indexing , world wide web
We consider the problem of indexing general database workloads (combinations of data sets and sets of poten- tial queries). We define a framework for measuring the efficiency of an indexing scheme for a workload based on two characterizations: storage redundancy (how many times each item in the data set is stored), and access overhead (how many times more blocks than necessary does a query retrieve). Using this framework we present some initial results, showing upper and lower bounds and trade-offs between them in the case of multi-dimen- sional range queries and set queries. A systems approach to this "generalized indexing" problem has been proposed and implemented (ll). The need for theoretical tools for the rigorous analysis of indexing problems was one of the main conclusions of that work. What seems to be needed is a kind of the- o y of indexability, a mathematical methodology which, in analogy with tractability, would evaluate rigorously the power and limitations of indexing techniques in di- verse contexts. What differentiates such a theoretical approach to indexing from complexity theory and the theory of in-memory data structures is its emphasis on the secondary storage nature of indexing schemes, and on the two aspects that determine their cost and feasi- bility: storage utilization and disk accesses. In this paper, we lay out a general framework for for- mally evaluating the power-and limitations of indexing schemes. We identify the salient features of an index- ing problem, and define two simple metrics for judg- ing the difficulty of the problem. Based on this frame- work, we provide some initial results: lower bounds and space/time tradeoffs for multidimensional range queries and for set inclusion queries. One novelty of our results is that they are stated exclusively in terms of a param- eter B, the block size. This important technological pa- rameter, which is usually ignored in the data structures literature, is at center stage in our work. Interestingly, the size of the instance does not enter the statements of our results at all.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom