z-logo
open-access-imgOpen Access
A sketch for the KS test for Big Data
Author(s) -
Thalis D. Galeno,
João Gama,
Douglas O. Cardoso
Publication year - 2021
Language(s) - English
Resource type - Conference proceedings
DOI - 10.5753/kdmile.2021.17455
Subject(s) - univariate , test statistic , computer science , sketch , statistic , kolmogorov–smirnov test , goodness of fit , sample (material) , parametric statistics , statistical hypothesis testing , statistics , algorithm , sample space , test (biology) , data mining , mathematics , artificial intelligence , machine learning , multivariate statistics , chemistry , chromatography , paleontology , biology
Motivated by the challenges of Big Data, this paper presents an approximative algorithm to assess the Kolmogorov-Smirnov test. This goodness of fit statistical test is extensively used because it is non-parametric. This work focuses on the one-sample test, which considers the hypothesis that a given univariate sample follows some reference distribution. The method allows to evaluate the departure from such a distribution of a input stream, being space and time efficient. We show the accuracy of our algorithm by making several experiments in different scenarios: varying reference distribution and its parameters, sample size, and available memory. The performance of rival methods, some of which are considered the state-of-the-art, were compared. It is demonstrated that our algorithm is superior in most of the cases, considering the absolute error of the test statistic.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here