Space-Efficient Estimation of Statistics Over Sub-Sampled Streams
Author(s) -
Andrew McGregor,
A. Pavan,
Srikanta Tirthapura,
David P. Woodruff
Publication year - 2015
Publication title -
algorithmica
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.647
H-Index - 78
eISSN - 1432-0541
pISSN - 0178-4617
DOI - 10.1007/s00453-015-9974-0
Subject(s) - data stream , normalization (sociology) , data stream mining , streams , computer science , statistics , theory of computation , entropy (arrow of time) , algorithm , data mining , mathematics , computer network , physics , quantum mechanics , sociology , anthropology
In many stream monitoring situations, the data arrival rate is so high that it is not even possible to observe each element of the stream. The most common solution is to sample a small fraction of the data stream and use the sample to infer properties and estimate aggregates of the original stream. However, the quantities that need to be computed on the sampled stream are often different from the original quantities of interest and their estimation requires new algorithms. We present upper and lower bounds (often matching) for estimating frequency moments, support size, entropy, and heavy hitters of the original stream from the data observed in the sampled stream.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom