z-logo
Premium
Time‐decayed correlated aggregates over data streams
Author(s) -
Cormode Graham,
Tirthapura Srikanta,
Xu Bojian
Publication year - 2009
Publication title -
statistical analysis and data mining: the asa data science journal
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.381
H-Index - 33
eISSN - 1932-1872
pISSN - 1932-1864
DOI - 10.1002/sam.10053
Subject(s) - data stream mining , dimension (graph theory) , computer science , data stream , exponential growth , exponential function , sliding window protocol , exponential decay , computation , algorithm , predicate (mathematical logic) , space (punctuation) , approximation error , theoretical computer science , data mining , mathematics , window (computing) , combinatorics , telecommunications , mathematical analysis , physics , nuclear physics , programming language , operating system
Data stream analysis frequently relies on identifying correlations and posing conditional queries on the data after it has been seen. Correlated aggregates form an important example of such queries, which ask for an aggregation over one dimension of stream elements which satisfy a predicate on another dimension. Since recent events are typically more important than older ones, time decay should also be applied to downweight less significant values. We present space‐efficient algorithms as well as space lower bounds for the time‐decayed correlated sum, a problem at the heart of many related aggregations. By considering different fundamental classes of decay functions, we separate cases where efficient approximations with relative error or additive error guarantees are possible, from other cases where linear space is necessary to approximate. In particular, we show that no efficient algorithms with relative error guarantees are possible for the popular sliding window and exponential decay models, resolving an open problem. This negative result for the exponential decay holds even if the stream is allowed to be processed in multiple passes. The results are surprising, since efficient approximations are known for other data stream problems under these decay models. This is a step toward better understanding which sophisticated queries can be answered on massive streams using limited memory and computation. Copyright © 2009 Wiley Periodicals, Inc. Statistical Analysis and Data Mining 2: 294‐310, 2009

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here