Open Access
Optimized Bootstrap Sampling for σ-AQP Error Estimation: A Pilot Study
Author(s) -
Semih Cal,
E P Cheng,
Feng Yu
Publication year - 2021
Publication title -
epic series in computing
Language(s) - English
Resource type - Conference proceedings
ISSN - 2398-7340
DOI - 10.29007/bkw9
Subject(s) - computer science , sampling (signal processing) , data mining , query optimization , selection (genetic algorithm) , confidence interval , big data , machine learning , statistics , mathematics , filter (signal processing) , computer vision
Approximate query processing (AQP) aims to provide an approximated answer close to the exact answer efficiently for a complex query on large datasets, especially big data. It brings enormous benefits into many data science fields when the efficiency of query execution weighs more than the accuracy. However, assessing the accuracy of an approx- imated answer from AQP deserves more study. Existing work usually relies on strong dataset assumptions which may not work for real-world datasets. In this work, we employ bootstrap sampling to assess the estimation errors of the AQP for selection queries (called σ-AQP). We implement a prototype system which can calculate confidence intervals for the estimated query results. Experiment results demonstrated that the confidence intervals generated by the prototype system can cover the ground truth of the query results with high accuracy and low computing cost. In addition, we implement optimization strate- gies for the bootstrap sampling which have significantly improved the overall computing efficiency.